atom.xml

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Wenzhong&#39;s Playground</title>
  
  <subtitle>half coder, full player</subtitle>
  <link href="/atom.xml" rel="self"/>
  
  <link href="http://fwz.github.io/"/>
  <updated>2019-07-16T02:13:35.064Z</updated>
  <id>http://fwz.github.io/</id>
  
  <author>
    <name>Wenzhong</name>
    
  </author>
  
  <generator uri="http://hexo.io/">Hexo</generator>
  
  <entry>
    <title>Product Manager: An Engineer&#39;s Perspective</title>
    <link href="http://fwz.github.io/2019/06/24/product-manager-an-engineer&#39;s-perspective/"/>
    <id>http://fwz.github.io/2019/06/24/product-manager-an-engineer&#39;s-perspective/</id>
    <published>2019-06-24T02:21:20.000Z</published>
    <updated>2019-07-16T02:13:35.064Z</updated>
    
    <content type="html"><![CDATA[<p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/product-team.png" alt="Product manager"></p><p>I recently start an online lesson on Geektime and come across <a href="https://time.geekbang.org/column/article/0?cid=80" target="_blank" rel="noopener">an enlightening lecture</a> from a facebook Product Manager Xiaoyin Qu. This article collects some learning &amp; opinion that I thought useful.</p><a id="more"></a><h3 id="General-Responsibility"><a href="#General-Responsibility" class="headerlink" title="General Responsibility"></a>General Responsibility</h3><ul><li>The responsibility of product manager is to lead a team to effectively deliver products which match user’s needs.</li><li>Product manager is the owner of a product. They design the product, the roadmap and strategy of the product.</li><li>Impact people, not manage people. A basic requirement of PM is to make other buying in your idea / project.</li><li>Primary work of A great PM is not composing product requirement document. Planing long term strategy, building up a efficient / motivated team, would take a considerable portion of time.</li><li>Product manager vs project manager: <ul><li>Product manager help product team make decision and deliver a product that gain love and trust from users. </li><li>Project manager help product team to deliver complicated project in limited resources and time. </li><li>Product manager do the right thing, project manager help do things right.</li></ul></li><li>Leadership / Communications / Innovation / Execution are four element towards success.</li></ul><h3 id="User-Requirement"><a href="#User-Requirement" class="headerlink" title="User Requirement"></a>User Requirement</h3><ul><li>Product requirement should always stands for users’ interestes and the correctness of existing pain-point. Segway / Block chain / AI for online celebrity are currently something that shines but not solving real problem.</li><li>Pursuing user Requirement:<ul><li>Target user group.</li><li>Draft some assumption of user pain-points via investigation and data analysis.</li><li>Deepdive data and feedback to verify the existences of pain-points.</li><li>Repeat step 2 &amp; 3.</li></ul></li><li>Don’t try to design something for everyone at first. 100% problems solved for a small but specific user group out weight 50% problem solved for a large group.</li><li>Misunderstanding of targeting:<ul><li>Target only part of target customer group.</li><li>Target the surface requirement, not the core one.</li></ul></li><li>The standard user story: As a (xxx), I want to (do yyy something) to help me achieve (zzz goals).<ul><li>xxx is the target user.</li><li>yyy is the user requirement.</li><li>zzz is the solved user problem.</li></ul></li><li>Use case framework<ul><li>Consider entry / reward / architect reuse / example.</li></ul></li><li>Minimized Valueable Product (MVP)<ul><li>Minimized function means minimized costs.</li><li>Valueable means solution to problems.</li><li>“In this proejct, we can modify requirement to make it compatible with current infra structure”. That’s a wise product manager would say for MVP.</li></ul></li><li>Defining OKR<ul><li>OKR should be concrete enough (including deadline, metrics). <ul><li>“Make user spend more time on App” is not a good metric. </li><li>“DAU reach 1 million at the end of 2019” is.</li></ul></li><li>Identify key metrics. For e-commerce, population of registered user is not critical, population of ordered user is.</li><li>Take <strong>counter metrics</strong> into consideration.</li><li>Rank priorities of multiple OKRs in case we need compromisation.</li></ul></li></ul><h3 id="Basic-requirement"><a href="#Basic-requirement" class="headerlink" title="Basic requirement"></a>Basic requirement</h3><h3 id="Writing-PRD"><a href="#Writing-PRD" class="headerlink" title="Writing PRD"></a>Writing PRD</h3><ul><li>Describe the problem to solve (in different scenario)</li><li>Identify existence of the problem.</li><li>Design success and counter metrics.</li><li>Explain detail function, including:<ul><li>explaining the entry of the function.</li><li>a flow chart of function.</li><li>explaining any current product framework / architecture reuseable?</li><li>if cross team cooperation needed, deal with it in advance and make notes on PRD.</li><li>explain fallback strategy (when things goes out of expectation).</li></ul></li></ul><h3 id="Requirement-Discussion"><a href="#Requirement-Discussion" class="headerlink" title="Requirement Discussion"></a>Requirement Discussion</h3><ul><li>Make a reasonable deadline for decision making. People are tended to be deadline driven.</li><li>Use kickoff meeting to form sub-groups. sub-groups leader take assignments and make it done.</li><li>Plan meeting with an agenda and time arrangment for each item. Stick to it on the meeting.</li><li><strong>Argument</strong> could be settled by aligning both sides’ decision making criteria.</li><li>Describe the purpose of meeting at the agenda seal a effective meeting.</li><li>Discuss early and share background so there will be more chances to find an cost effective way to finish a task.</li><li>Let engineer design the product detail is effective. (IMO, it’s true but decision should be broadcasted to PM in case there is missing context.)</li><li>Take responsibility when some effort is wasted.</li></ul><h3 id="Time-management"><a href="#Time-management" class="headerlink" title="Time management"></a>Time management</h3><ul><li>Record 3 most important tasks today before working.</li></ul><h3 id="Growth"><a href="#Growth" class="headerlink" title="Growth"></a>Growth</h3><ul><li><strong>Core Formula: Growth = Acquisition + Retention + Resurrection</strong>. Refer this formula to setup goal and plan strategy. </li><li>Make sure everyone in team know the growth goal.</li><li>Track every relevant A/B test (hypothesis / engineering progress / traffic portion / test duration / owner)</li><li>Non-positive tests are also treasure for the team.</li><li>Use dashboard / showcase to display core metrics. Monitor metrics.</li><li>PM should train the team to be more conscious on growth.</li><li>Workflow of growth:<ol><li>defining the metric to improve.</li><li>understanding current product.</li><li>digging out relavant odds / chances and make hypothesis.</li><li>brainstorming solution to the hypothesis.</li><li>executing test, find the best solution and release.</li></ol></li><li>Rules of “Understand, Identify &amp; Execute”<ul><li>Don’t roughly skip “Understand” even we have identified something. Simply identifing solution and starting execution, instead of understanding the whole picture of the product, might be risking of missing the best idea although the current one still bring positive change to the product.</li></ul></li><li>Make long-term mission statement. Make sure the product stick to it, no matter how aggressive the growth goals are.</li><li>Anti metrics to mission statement are core metrics during growth. Quantitize anti-metrics, missing anti-metrics is also failure. </li></ul><h3 id="Execution"><a href="#Execution" class="headerlink" title="Execution"></a>Execution</h3><ul><li>Build up a <strong>result driven</strong> team, everyone has a <strong>clear responsibility boundary</strong>.<ul><li>All people know everyone’s responsibility.</li><li>Working output could be evaluated.</li><li>If one encounter problems, he should ask for help, and the team should help him solving it. If problems keep blocking progress, report to product manager in-time.</li><li>Stealing credits are prohibited.</li></ul></li><li><strong>Control duration of decision making process</strong><ul><li>Not every decision has to be made by everyone</li><li>Every decision has its own “argue time range”. Length of the range are decided by the importances of the decision. Most important one worth more discussion. For normal decision, conclusion should be made quickly for the sake of execution. <ul><li>To settle argument, such wording might be useful: this decision is not quite important and we don’t have to spend so much time on it. There are still room of later adjustment according to user feedback”</li></ul></li><li>Make sure everyone in team know the priority and work according to priority.</li></ul></li></ul>]]></content>
    
    <summary type="html">
    
      &lt;p&gt;&lt;img src=&quot;https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/product-team.png&quot; alt=&quot;Product manager&quot;&gt;&lt;/p&gt;
&lt;p&gt;I recently start an online lesson on Geektime and come across &lt;a href=&quot;https://time.geekbang.org/column/article/0?cid=80&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;an enlightening lecture&lt;/a&gt; from a facebook Product Manager Xiaoyin Qu. This article collects some learning &amp;amp; opinion that I thought useful.&lt;/p&gt;
    
    </summary>
    
      <category term="Read &amp; Learn" scheme="http://fwz.github.io/categories/Read-Learn/"/>
    
      <category term="Book Review" scheme="http://fwz.github.io/categories/Read-Learn/Book-Review/"/>
    
    
  </entry>
  
  <entry>
    <title>未来大学：文中的职场生命故事</title>
    <link href="http://fwz.github.io/2019/05/02/%E6%9C%AA%E6%9D%A5%E5%A4%A7%E5%AD%A6%EF%BC%9A%E6%96%87%E4%B8%AD%E7%9A%84%E8%81%8C%E5%9C%BA%E7%94%9F%E5%91%BD%E6%95%85%E4%BA%8B/"/>
    <id>http://fwz.github.io/2019/05/02/未来大学：文中的职场生命故事/</id>
    <published>2019-05-01T16:28:22.000Z</published>
    <updated>2019-06-10T17:10:44.720Z</updated>
    
    <content type="html"><![CDATA[<p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/future_college_1.png" alt="生命故事"></p><p>受未来大学的邀请，我为未来大学的学员主讲了一期课程。其中有一段内容，是向学员介绍我的「职场故事」。我选择了「如何建立主动意识，明确方向快速蜕变」这个方向。</p><p>课程由北辰青年策划，课程内容由昕睿、文中共同完成创作。</p><a id="more"></a><h3 id="Part-1"><a href="#Part-1" class="headerlink" title="Part 1"></a>Part 1</h3><p>今天我给大家介绍一下我职业生涯迄今最重要的一个挫折。</p><p>2012 年的春天，我研究生毕业，入职 Yahoo。这个时候，我的部门正准备研发一个新的「个性化推荐系统」，用来支撑 Yahoo 全球的首页推荐。</p><p>推荐系统是一个非常庞大和复杂的项目，它要根据你的浏览行为，学习你的兴趣口味，尽可能给你推荐丰富多样而又能吸引你的内容。Yahoo 的首页有日均亿级别的流量，对系统的性能也有不小的挑战。所以对于北京团队来说，这是一个让人非常兴奋的研发任务；</p><p>我当时被直属经理安排在为推荐系统服务的数据系统工作。数据系统最主要的作用，就是帮助评估推荐系统的效果，负责数据系统的技术 Leader 也成为了我的导师。</p><p>入职后的半年里，我对一切都感到新奇，周围有陌生又有趣的知识，友善又资深的同事，我如饥似渴地学习工作。对我来说，那半年的技术成长是巨大的。</p><p>时间很快就到了年度的绩效考核。</p><p>作为新人，我当时其实并不知道绩效考核意味着什么，所以我也在汇报的时候，自豪地总结了一下「自己过去半年的成长」。因为，对我而言，那半年我的技术能力得到了很大的提升，即使不能说特别满意，但是肯定也没有虚度时光。</p><p>一段时间之后，我拿到了我的绩效反馈，总体评价是： Occasional Miss。</p><p>Occasional Miss 是什么意思呢？后来我才知道，在外企中，拿到这个结果，意味着你的工作成绩还没到及格线，只是还不至于无药可救需要被踢出团队（也就是倒数第二档）。</p><p>就好像自己花了半年准备去考试，估分起码有 85 分，结果只拿到 50 分。</p><p>当时拿到这个结果，我自己还是非常委屈的，毕竟自己这半年都是全力投入的，觉得我既没有辜负自己，也没有辜负团队，凭什么会拿到不及格这个成绩呢？</p><p>思来想去，心里有些愤愤不平，我特别希望我的 Manager（也就是部门经理） 告诉我，我为什么会得到这样的评价？至少也需要告诉我该怎么去提升，以及如何才能获得和我的付出相匹配 的评价。</p><p>虽然当时团队架构做了一次调整，我原来的 Manager 被调到了其他部门，但我还是试着约了他一场单独的对话。</p><p>这场对话进行了半个小时，谈完之后，我领悟到了很多道理。</p><p>从我 Manager 的角度，之所以给了我那样的评价，是因为以下的原因：</p><ul><li>当时我做过一次对外沟通，结果我当时大脑犯蒙，问了一个比较低级的问题，我的 Manager 立马出来帮我打了圆场。</li><li>这个绩效考核在我身上，更多的是一个相对值。我可能做得还不错，但我同期的同事有更亮眼的地方。</li></ul><p>回到自己的座位上，我又多想了一点：全球首页推荐这个项目非常庞大，而那个阶段整个部门的重心都在推荐系统的建设上，所以他精力很难顾及到每个团队，每个个人身上；再加上当时直接带我的是我的技术 Leader，所以 Manager 对我的工作内容和产出，不一定特别了解。再说，跟他直接干活的兄弟，他肯定也要照顾好。</p><p>所以，总结下来就是：</p><ol><li>Manager 不知道我在做什么； </li><li>Manager 看见了我不专业的一面；</li><li>我没有表现出竞争优势；<br>所以我得到了这个虽然不应该属于我但事实上又挺合理的结果。</li></ol><p>当时多少是有点觉得人生苦涩的。兢兢业业干活得了个这样的结果。不过感伤归感伤，知道了背后的原因以后，就应该振作起来扭转局面了。</p><h3 id="Part-2"><a href="#Part-2" class="headerlink" title="Part 2"></a>Part 2</h3><p>我梳理了一下 Manager 告诉我的信息和环境局面，重新反思了一下：</p><p>第一点是，我不应该只埋头做自己觉得正确的事，要顾及到 Manager 和他的整个团队的目标和重心。我应该主动认领一些部门 Leader 觉得比较重要甚至是棘手的事情，尝试来解决。</p><p>第二点是，我应该多向上沟通，多获取反馈，做事情的方法有了偏差，就要尽早拉回来。而不是等到绩效考核的时候被动地去接收。</p><p> 那次反省之后，我重新调整了我的工作状态。然后在接下来的两个季度里，我去主动认领了和推荐系统有关的核心工作任务。</p><p>我当时需要和美国的科学家一起，统计用户在相关推荐产品上的停留时间。当时 Yahoo 每天的页面浏览量是亿级别的，这个任务需要分析和管理海量的用户行为数据，对我来说挑战十分大。虽然那段时间非常难熬和痛苦，但最终还是扛下来了。</p><p>在和 Manager 的接触上，我也有了很大的变化，我会定期去找我的新 Manager 一对一谈话，理解他目前想达成的目标、了解他对我的工作表现的反馈。因为有了这种积极沟通，我的工作始终保持在正轨之上，和团队重心保持一致。</p><p>半年过去了，我有了更好的产出、和 Manager 沟通更密切、变得更职业，这些成果帮助我取得了团队和 Manager 的信任，很快我的绩效就上升到一个合理的水平了。</p><p>在庆幸自己做了正确的选择之余，我也重新审视了一下自己：我应该怎么做才能为团队更好的作出贡献，怎么做才能成为团队中不可或缺的成员呢？ </p><p>这个时候，我想起了刘未鹏老师的文章<a href="http://mindhacks.cn/2009/01/14/make-yourself-irreplacable/" target="_blank" rel="noopener">《什么才是你的不可替代性和核心竞争力》</a>，觉得也是时候开始探寻和建立我自己的核心竞争力，为团队创造不一样的价值了。</p><p>在综合考虑了个人兴趣、技能树和团队职能以后，我选择了「数据可视化」这个方向。然后我就在周末的时间不断地去学习相关的理论和前沿的编程工具，研究怎么样可以在这个体系里，把用户所关心的数据以更好的方式，像讲故事一样展示出来，研究怎么样可以更好地揭示数据之间的联系，研究怎么样可以 赋予用户能力，让他们进一步探索数据。</p><p>一段时间后，我独立做出了几个小应用，把由枯燥数字组成的「数据处理流程」和「数据存储空间消耗分布」，用一个更清晰美观的方式展现了出来。我拿着它们展示给我的 Manager，她当时非常惊喜，马上就转发给我们的用户看。用户也给出了好评，还说「一定要加到我们的产品中去，文中，Good Job！」。</p><p>后来在公司的 Hackday （黑客日）中，我给这个功能又加入了一个物理引擎，数据的变化趋势展示得更为自然和真实，得到了评委的青睐，拿到了一个分区大奖。最后，这个作品还入选了 Yahoo 年度技术大会 Tech Pulse 的 Poster。</p><p>自那之后，我在这个领域慢慢树立了自己独特的影响力，而我职业的道路也因此变得越来越顺畅。</p><h3 id="Part-3"><a href="#Part-3" class="headerlink" title="Part 3"></a>Part 3</h3><p>再到后来，我也成为了技术 Leader，开始带团队。我也开始做绩效沟通、考虑如何激发团队的斗志，考虑如何把人放在最合适的位置上，考虑如何让团队成员增加曝光，建立技术声誉。渐渐地，我会发现，一个团队真的很需要那些主动思考，提出自己见解，并自发执行的人。</p><p>有人说，船长在船舱里和大家是闹成一片的，但当他独自站在船头看方向时，他的内心是孤独的。    </p><p>作为一个 Leader，除了要结合公司使命，确定团队的整体目标，带领团队完成任务以外，还要应对很多隐形问题：例如考虑团队的整体发展和建设方向，作出利害攸关的决策，攻克复杂困难的问题。总会有一些时候，Leader 需要一些有力的帮助。假如有伙伴可以主动站出来，帮助 Leader 和整个团队分担一些责任，或者处理一些难题，Leader 一定会更信任和看重这样的伙伴。</p><p>更多时候，Leader 能在核心方向上进行把控，但很难有精力事无巨细地去了解每个小伙伴的工作状况和心理状态。很多具体的工作，作为 Leader 其实并没有一线的伙伴清楚其中的难处。这个时候，Leader 其实非常需要大家的主动沟通和反馈的。</p><p>所以成为 Leader 后，我也更能理解，当年我的老板在绩效考核中给出的评价。我误以为他会看到我的努力和表现。但事实上，如果我不及时地跟他反馈和沟通，他在那个时候，是很难深入地了解我的，他也只能凭借和我的一两次接触，作出推理和评价。</p><p>现在，我再去回过头看我刚毕业时的经历。我非常庆幸能提早就遇到那样的小挫折，那是我职业生涯里一笔非常巨大的财富。</p><p>这个挫折来得不晚也不狠，它给了我充足的时间去反思和调整，我应该如何主动去保持和团队Leader的良性沟通，不然很容易自己埋头苦干，然后偏离了轨道。</p><p>而且那次也促使我去意识到，我应该去主动寻求和建立自己的差异化优势，一方面可以为团队带来更大的价值，另外一方面，也极大地帮助了我自己的职业发展。</p><p>正是在这种主动意识的推动下，我也是那个时候开始去承担团队里更多有挑战的任务。现在回头看，当我去主动选择了工作内容，而不是被动地等待任务掉到头上时，我能够获得更多的主动权，甚至一定程度上会得到更多的宽容和理解。</p><p>神奇的是，这些有挑战性的事情，（一般情况下）只要找对方法，总是可以顺利推进，得到好的结果的。当一件件事情相继有好的产出以后，我们在团队中的影响力，就会不断被强化，然后就会逐渐承担更有挑战性的工作。往往就是在这个过程中，我们能够加速提升个人能力，扩大工作职责，得到更多的发展机会。这是一个正向循环和强化的过程。</p><h3 id="Summary"><a href="#Summary" class="headerlink" title="Summary"></a>Summary</h3><p>总结一下，我在职业生涯的早期，学到了三个「主动」：分别是「主动向上沟通」、「主动建立自己的差异化优势」以及「主动承担责任」。有了这样的意识和担当，你会发现自己能够以他人不能理解的速度，在职业发展道路上飞速（地）成长。</p><p>遇见未来更好的自己，这里是未来大学。我是文中，谢谢你的阅读。</p>]]></content>
    
    <summary type="html">
    
      &lt;p&gt;&lt;img src=&quot;https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/future_college_1.png&quot; alt=&quot;生命故事&quot;&gt;&lt;/p&gt;
&lt;p&gt;受未来大学的邀请，我为未来大学的学员主讲了一期课程。其中有一段内容，是向学员介绍我的「职场故事」。我选择了「如何建立主动意识，明确方向快速蜕变」这个方向。&lt;/p&gt;
&lt;p&gt;课程由北辰青年策划，课程内容由昕睿、文中共同完成创作。&lt;/p&gt;
    
    </summary>
    
      <category term="Career" scheme="http://fwz.github.io/categories/Career/"/>
    
      <category term="Growth" scheme="http://fwz.github.io/categories/Career/Growth/"/>
    
    
  </entry>
  
  <entry>
    <title>Sim Cloud：蔚来服务云仿真平台设计</title>
    <link href="http://fwz.github.io/2019/02/20/sim-cloud/"/>
    <id>http://fwz.github.io/2019/02/20/sim-cloud/</id>
    <published>2019-02-20T15:02:54.000Z</published>
    <updated>2019-05-02T08:21:39.049Z</updated>
    
    <content type="html"><![CDATA[<p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/Sim-Cloud-Icon.png" alt=""></p><p>「一键加电」是蔚来为车主提供的代客加电服务。用户在 App 中下单后，系统的调度模式如下：用户下单后，调度系统会根据用户期望的取车位置 / 取车时间、车辆状态（例如续航里程）等因素，求解服务专员和服务电力资源的组合方案。系统在求解过程中会考虑一系列的地理信息因素（如限行围栏、道路状况等），然后结合服务人员的空闲情况、电力资源的服务能力和再生能力，以及系统的配置，求出服务距离、服务时长综合最优的服务方案。</p><p>为用户提供的加电资源主要有两种：换电站和移动充电车。换电站可以在 3 分钟内将旧电池拆下，换上新电池，提供非常好的加电体验。移动充电车则适合于远郊或低电量场景，除换电站和移动充电车以外，在部分资源还没有密集布局的地区，我们也会利用三方的充电桩进行加电服务。</p><p>以下是蔚来一键加电的服务流程图：<br><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/NSC-PE-SOP.png" alt="一键加电流程"></p><p>(其中虚线框表示服务流程，绿色段表示对资源的占用，蓝色段表示对服务专员的占用)。</p><p>随着车辆的逐渐交付，线上一键加电的订单需求日益增多。为保障一键加电的服务完成率和准时率，我们有大量的优化问题需要被解决，如：</p><ul><li>预测 traffic model 的变化可能带来的服务时长、人员利用效率的影响</li><li>预测不同的人力和电力资源供给情况下，系统的服务能力上限</li><li>为服务人员的资源布局提出建议</li><li>评测待上线的算法和调参的效果</li></ul><p>这些问题对系统的调度性能提出了离线和可测量要求。</p><p>对于包含调度行为的 O2O 系统，服务承载能力难以通过一个简单的模型或者线性公式进行数学推演，直接得出有参考意义的指标。为了更好地评测和调优系统，以及给出一线运营资源的配置建议，我们设计并实现了 Sim Cloud —— <strong> 一个可以精确重建线上系统和环境，引入线上流量或自定义流量，估算系统供需能力曲线、评估调度策略性能的仿真平台 </strong>。</p><a id="more"></a><h2 id="名词简介"><a href="#名词简介" class="headerlink" title="名词简介"></a>名词简介</h2><ul><li>服务专员：为用户进行加电服务的专员（以下简称专员）。</li><li>换电站：蔚来自研的电池更换站点，可以在3-5分钟内为蔚来的全系车辆更换电池，占地约3个正常车位。</li><li>充电桩：一个连接电网，通过充电插头向电动汽车充电的充电设备。</li><li>充电桩群：充电桩一般批量建设，一个批量建设并提供充换电停车空间的场所即为桩群。</li><li>移动充电车：蔚来自研的移动充电设备。充电车自带电池，可以利用自身的电量向其他车辆充电。</li><li>加电资源：在下文中特指 移动充电车、换电站、充电桩群。</li><li>资源：在下文中特指专员、移动充电车、换电站、充电桩群。</li><li>SOC: <strong>S</strong>tate <strong>o</strong>f <strong>C</strong>harge，反映电池包内当前电量和总体可用电量之间百分比比值的电量。</li><li>取车位置：用户下单时指定的专员取车位置。</li><li>ETA：<strong>E</strong>stimated <strong>T</strong>ime of <strong>A</strong>rrival，（特定事件的）预计完成时间。</li><li>服务步骤：包括前往取车、取车地点寻车、前往到服务点、充换电、还车到原定位置等一系列为完成一键加电订单的专员活动。</li></ul><h2 id="架构"><a href="#架构" class="headerlink" title="架构"></a>架构</h2><h3 id="目标和挑战"><a href="#目标和挑战" class="headerlink" title="目标和挑战"></a>目标和挑战</h3><p>如上文所述，我们希望 Sim Cloud 系统可以精确仿真线上环境，回放线上下单流量，达成以下业务目标：</p><ol><li>模拟线上系统在给定资源数量、状态及分布情况下的服务承载能力。</li><li>测试线上系统在不同的业务配置和算法策略下的服务能力表现。并通过以下指标，指导运营及开发人员调整配置及调度算法，以达到调度效率与准时的最佳平衡点。<ul><li>系统效率指标 [下单成功率，最大下单成功数，专员移动距离及服务时长]</li><li>用户满意度指标 [准时率,用户感知服务时长]</li></ul></li><li>通过控制变量，模拟出资源的位置对系统效率指标及用户满意度指标的影响，以指导未来资源的地理分布和数量规划。</li></ol><p>除了上述业务以外，从工程层面，我们也致力于：</p><ol><li>提高订单流量的模拟速度</li><li>降低开发和维护成本</li><li>保证模拟能力，能满足各种各样的线上事件模拟</li></ol><p>要实现上述的设计目标，会面临一些挑战。在整个仿真流程中，确定性仿真是我们最关注的问题，仿真是否有效会直接影响到实验结果。此外，数据安全、仿真速度、重复仿真也是 Simulator 成败的关键。基于此，我们把 Simulator 需要解决的关键问题归类如下：</p><ul><li>确定性仿真<ul><li>一个订单从开始到结束的一个完整生命周期内，模拟器需要根据每一步的状态决定下一步的操作及时间。</li><li>在保证安全性和速度的情况下拉取线上数据，尽可能真实的还原线上状态。（专员的排班，服务能力，服务城市，使用的工具，上班位置，加电资源的分布，状态，在各个系统中关于专员、加电资源、用户车的状态及位置）</li><li>用户车、专员、用户、及其手机号等机密信息，需要在使用之前替换掉，并保证所有依赖数据同步。</li><li>有些服务由于需要车机联网或者证书等依赖，模拟环境无法调通，需要mock。</li></ul></li><li>加快模拟器中的时间流速</li><li>重复仿真：仿真线上环境在不同的业务配置和算法策略下的服务能力表现，需要有重复还原线上环境的能力。</li></ul><h3 id="整体流程介绍"><a href="#整体流程介绍" class="headerlink" title="整体流程介绍"></a>整体流程介绍</h3><p>Sim Cloud 进行一次完整仿真有以下的几个步骤<br><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/Sim-Cloud-Arch-Workflow.png" alt="SIM 流程"></p><ol><li>拉取 / 准备数据</li><li>初始化环境</li><li>回放流量 / 事件</li><li>效果评估</li></ol><p>下面逐一介绍每个过程要解决的问题和方法：</p><h3 id="拉取-准备数据"><a href="#拉取-准备数据" class="headerlink" title="拉取 / 准备数据"></a>拉取 / 准备数据</h3><p>Sim Cloud 目前能根据两种方式生成实验数据输入（Simulation Input)：线上真实数据的分布 / 实验者自定义分布。通过指定不同的数据生成配置，实验者可以使用数据生成器，生成不同的实验数据输入。</p><p>对于不同的数据生成方式，我们通过不同的 generator 来生成对应的 simulation input data。以线上流量分布生成为例，generator 会生成以下四类数据</p><ul><li>专员：拉取线上专员的排班、岗位数据</li><li>加电资源：拉取线上加电资源数据，包括移动充电车、换电站、充电车的状态、位置、服务能力配置</li><li>下单意图：拉取线上订单信息，将所有有下单行为的用户和车辆信息，使用资源库数据进行脱敏，并增加下单时间</li><li>系统及业务配置：拉取将线上核心系统的配置快照</li></ul><p>通过以下的方式，我们就能开始将服务专员、加电资源、系统及业务配置，用户车信息等同步到仿真环境。</p><p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/Traffic-Clone.png" alt="数据准备流程"></p><p>对自定义的供需分布，我们可以产生根据参数，生成确定性的输入和输出，并允许用户手动调整最终的结果。</p><h3 id="初始化环境"><a href="#初始化环境" class="headerlink" title="初始化环境"></a>初始化环境</h3><p>环境的初始化主要包含两个步骤：</p><ol><li>将业务数据注入到 Sim Cloud 环境（用户 / 车辆 / 服务人员 / 电力资源）</li><li>初始化 Sim Cloud 各子系统的配置</li></ol><p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/Environment-Init.png" alt="初始化环境流程"></p><p>完成后，Sim Cloud 能感知到的环境上下文就完成构建了。</p><h3 id="回放流量-事件"><a href="#回放流量-事件" class="headerlink" title="回放流量 / 事件"></a>回放流量 / 事件</h3><h4 id="动态事件模拟"><a href="#动态事件模拟" class="headerlink" title="动态事件模拟"></a>动态事件模拟</h4><p>由于拉取线上的数据是瞬态的，用线上数据初始化仿真环境的做法对于某些动态事件（例如专员上下班、绑定/解绑工具、系统派单、服务步骤、用户下单、下单时的用户车SOC及位置变化）就无能为力了。根据发生的原因这些事件可以分为以下两种：</p><p><strong>非依赖事件</strong>：不依赖其它事件，到固定的时间就会执行。比如专员上下班，系统派单。<br><strong>依赖事件</strong>：依赖其它事件，比如<code>服务步骤</code>里面的几个事件都依赖前面的步骤，毕竟<code>取车失败</code>也就不会有后面的<code>到服务点加电</code>事件了。</p><p>我们使用一个按时间自动排序的 Event Queue 来模拟动态事件。在 SIM 运行之前，会初始化一些非依赖事件以及下单事件到 Event Queue 中。<br>SIM 运行时内部会维护一个当前时间（与现实的当前时间无关），每次都会将预期该时间执行的 Event 取出并执行。那么对于依赖事件，比如服务步骤中<code>取车</code> Event 执行完并且成功，会根据返回向 Event Queue 放入服务步骤的下一步及其触发时间。当 Queue 中没有 Event 时，SIM 运行也随之停止。</p><p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/Sim-Cloud-Simulation.png" alt="事件驱动"></p><h3 id="效果评估"><a href="#效果评估" class="headerlink" title="效果评估"></a>效果评估</h3><p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/Sim-Cloud-Evaluation.png" alt="效果评估"></p><p>在仿真完成以后，会进入效果评估阶段。</p><p>效果评估和线上的指标统计 pipeline 使用相同的 code base，通过从业务数据的存储介质中拉取数据，对整体的指标进行汇总计算。</p><h2 id="Tricks"><a href="#Tricks" class="headerlink" title="Tricks"></a>Tricks</h2><p>在开发 Sim Cloud 的过程中，遇到过不少痛点，这些困难在所有仿真系统都可能遇到，这里也分享一下我们的方案。</p><h3 id="依赖管理"><a href="#依赖管理" class="headerlink" title="依赖管理"></a>依赖管理</h3><p>为了准确定义系统行为和获取评测数据，仿真环境最好能提供系统层面和数据层面的隔离。这里就存在系统要完整跑通时，依赖系统过多的问题。一键加电业务的核心模块约有10个，而其他相关的服务子模块 / 微服务大概有 40 多个，涉及的后台基础设施（例如存储）可能会更多。部分系统也有跨组或跨部门的依赖，环境的协调和搭建会比较耗时。</p><p>目前我们进行了几个改动，尽可能减少依赖：</p><ol><li>对于非核心模块的调用，假如不影响系统调度行为，则 NSC 中配置开关进行降级。线上开启，仿真关闭。关闭时使用默认降级行为进行处理。</li><li>部分难以跳过，但又不甚稳定的模块，自己进行 mock。</li></ol><p>经过一段时间的优化，要单独维护仿真环境的模块下降至 5 个左右。比起 40，属于比较理想的状态了。</p><p>有同学可能会问，有没有考虑过线上压测的方式？这里有几方面的考虑：</p><ul><li>离线仿真和线上压测的频率不同。线上压测可能只做一两次，每次用特定流量进行测试。离线仿真会抢时间做实验，一系列的离线实验可能排队 7*24 小时高速运转，所以产生的数据量会比压测大很多。</li><li>离线的实验会对调度逻辑进行改动，这部分的代码更多是为了验证想法，花在性能优化和业务逻辑的回归覆盖上的时间有限。即使在只加流量的场景中，相比于离线仿真，线上压测也会增加线上系统的稳定性风险。</li><li>数据标注。线上压测的情况下，全链路都需要对数据进行标注，避免统计数据产生偏差。</li></ul><p>当然单独维护一套环境也会带来额外的维护成本，例如线上版本的更新就会带来 Sim Cloud 的事件队列处理和线上版本的兼容性维护工作。</p><h3 id="仿真批量编排"><a href="#仿真批量编排" class="headerlink" title="仿真批量编排"></a>仿真批量编排</h3><p>想要高效逼近优化目标，需要有能力编排出一系列的仿真，对不同的调度策略和参数配置进行验证，甚至参数的自动化调整。Sim Cloud 在前端提供了类似 DSL 的支持，实验者可基于自己的想法或需求，生成一系列用于各次仿真生成仿真环境的配置，然后批量提交给仿真管理器。仿真管理器接收这一批次的仿真以后，就能根据实验者的指示自动触发。</p><p>举个例子，当需要估算线上服务承载能力极限的时候，我们需要不断调整输入单量和分布。这种情况下，我们可以先指定系统拉取最近 10 个周六的线上流量，然后指定 10 个采样比例，然后编排 10 个实验，每个实验都将采样后的流量压缩到一天，统一提交仿真测试。所有仿真完成以后，在报告页面查看各次仿真的结果对比。</p><h3 id="确定性"><a href="#确定性" class="headerlink" title="确定性"></a>确定性</h3><p>我们对仿真的基本要求是<strong>相同的输入能够得到相同的输出</strong>。对于 O2O 的业务系统，有较多的自由度或随机性，需要被管理或规范化，例如：</p><ol><li>根据数据分布生成仿真事件时的随机性。指仿真发起者在指定输入数据分布时，要确保相同的输入产生相同的输出或采样结果，这一般可以通过指定随机数种子保证。</li><li>调度系统中实现层面的随机性。指对于相同的上下文环境，系统需要从实现层面上保证确定性的输入和输出的对应一致。这是对业务系统的正常要求，但在并发且对资源存在竞争的情况下，也需要有能力保证。</li><li>仿真系统中定义的事件和调度系统的时钟系统不一致。事件的触发时间和真实输入仿真系统的时间肯定是不一样的，因此仿真系统需要有能力在系统内部<strong>指定和应用</strong>事件的「发生时间」。这里我们对每个仿真事件都增加了时间信息，而调度系统和资源系统的全部接口通过 AOP 增加了 <code>custom_timestamp</code> 的公共参数用以接收外部的特定时间信息。然后，调度系统封装出一个获取当前时间的 Helper 函数，当调用中存在 <code>custom_timestamp</code> 信息时，<code>Helper.getCurrentDate()</code> 会返回自定义时间，而非 <code>new Date()</code>，这样可以无侵入地让系统增加指定接口调用时间的能力。为了避免人为的疏漏，我们还增加了对 <code>new Date()</code> 代码检测的单元测试，确保没有新代码因为不熟悉此规则进入主干分支，静默引起不确定性。</li></ol><h3 id="仿真效率"><a href="#仿真效率" class="headerlink" title="仿真效率"></a>仿真效率</h3><ul><li>上述的独立时钟系统的设定，也能起到加速作用 —— Simulator 不再需要等待真实时间的消逝，只需要按照时间戳维护好事件队列，就可以通过按序执行事件，得到和真实世界相同的时序和数据变化。这能够大大加速仿真的执行效率</li><li>和很多时间敏感性型的功能（例如爬虫）一样，Sim Cloud 不希望自己的时间是有闲置的，而效果评估（Spark Job）部分，因为依赖仿真产出的业务数据，而本次仿真产生的业务数据，又应该在下一次仿真发起之前清除。所以假如要基于业务系统的存储进行评估，就会阻塞下次的仿真的触发。为了提高效率，我们将业务数据先 dump 到一个额外的存储上，然后并行通知：1. Evaluation 模块利用额外存储数据进行评估；2. Simulator 清理上次仿真生成的数据，并开始进行下一次仿真。统计和仿真互不阻塞，这样就为连续仿真节省了较多的时间。</li></ul><h2 id="小结"><a href="#小结" class="headerlink" title="小结"></a>小结</h2><p>Sim Cloud 系统目前的特性：</p><ul><li>高精还原线上环境，能指定回放特定的下单流量，评估系统效率指标</li><li>仿真速度快，一天订单流量在 5 分钟以内就回放完成</li><li>提供实验管理器，支持基于配置的参数化实验</li><li>提供不同实验之间的对比功能</li></ul><p>蔚来的「一键加电」项目在 2018 年获得了蔚来年度「超越期待的用户体验」价值成就大奖。在不断优化调度策略、缩短服务时长方面，Sim Cloud 也发挥了举足轻重的作用。</p>]]></content>
    
    <summary type="html">
    
      &lt;p&gt;&lt;img src=&quot;https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/Sim-Cloud-Icon.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;「一键加电」是蔚来为车主提供的代客加电服务。用户在 App 中下单后，系统的调度模式如下：用户下单后，调度系统会根据用户期望的取车位置 / 取车时间、车辆状态（例如续航里程）等因素，求解服务专员和服务电力资源的组合方案。系统在求解过程中会考虑一系列的地理信息因素（如限行围栏、道路状况等），然后结合服务人员的空闲情况、电力资源的服务能力和再生能力，以及系统的配置，求出服务距离、服务时长综合最优的服务方案。&lt;/p&gt;
&lt;p&gt;为用户提供的加电资源主要有两种：换电站和移动充电车。换电站可以在 3 分钟内将旧电池拆下，换上新电池，提供非常好的加电体验。移动充电车则适合于远郊或低电量场景，除换电站和移动充电车以外，在部分资源还没有密集布局的地区，我们也会利用三方的充电桩进行加电服务。&lt;/p&gt;
&lt;p&gt;以下是蔚来一键加电的服务流程图：&lt;br&gt;&lt;img src=&quot;https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/NSC-PE-SOP.png&quot; alt=&quot;一键加电流程&quot;&gt;&lt;/p&gt;
&lt;p&gt;(其中虚线框表示服务流程，绿色段表示对资源的占用，蓝色段表示对服务专员的占用)。&lt;/p&gt;
&lt;p&gt;随着车辆的逐渐交付，线上一键加电的订单需求日益增多。为保障一键加电的服务完成率和准时率，我们有大量的优化问题需要被解决，如：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;预测 traffic model 的变化可能带来的服务时长、人员利用效率的影响&lt;/li&gt;
&lt;li&gt;预测不同的人力和电力资源供给情况下，系统的服务能力上限&lt;/li&gt;
&lt;li&gt;为服务人员的资源布局提出建议&lt;/li&gt;
&lt;li&gt;评测待上线的算法和调参的效果&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;这些问题对系统的调度性能提出了离线和可测量要求。&lt;/p&gt;
&lt;p&gt;对于包含调度行为的 O2O 系统，服务承载能力难以通过一个简单的模型或者线性公式进行数学推演，直接得出有参考意义的指标。为了更好地评测和调优系统，以及给出一线运营资源的配置建议，我们设计并实现了 Sim Cloud —— &lt;strong&gt; 一个可以精确重建线上系统和环境，引入线上流量或自定义流量，估算系统供需能力曲线、评估调度策略性能的仿真平台 &lt;/strong&gt;。&lt;/p&gt;
    
    </summary>
    
      <category term="Engineering" scheme="http://fwz.github.io/categories/Engineering/"/>
    
      <category term="Simulation" scheme="http://fwz.github.io/categories/Engineering/Simulation/"/>
    
    
      <category term="Simulation, NIO" scheme="http://fwz.github.io/tags/Simulation-NIO/"/>
    
  </entry>
  
  <entry>
    <title>Practical Apache Kafka Usage</title>
    <link href="http://fwz.github.io/2018/07/29/Practical-Kafka-Usage/"/>
    <id>http://fwz.github.io/2018/07/29/Practical-Kafka-Usage/</id>
    <published>2018-07-28T16:24:02.000Z</published>
    <updated>2019-05-02T17:20:41.772Z</updated>
    
    <content type="html"><![CDATA[<p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/franz-kafka-en-1906.jpg" alt="Kafka"></p><p>I collect some of Apache Kafka usage in our team and write an internal post in Chinese. And I would like to pull the fundamental part into this post.</p><a id="more"></a><h2 id="重要配置"><a href="#重要配置" class="headerlink" title="重要配置"></a>重要配置</h2><h3 id="Producer"><a href="#Producer" class="headerlink" title="Producer"></a>Producer</h3><p>生产者有很多可以配置的参数，大部分有合理的默认值，详情可以参考 Kafka 的文档。这里摘录了部分会影响生产环境中数据正确性的关键参数，并进一步解释如下。</p><table><thead><tr><th>parameters</th><th>description</th></tr></thead><tbody><tr><td>acks</td><td>指定有多少个分区副本确认收到消息，生产者才会认为消息写入是成功的。<br> 如果 ack = 0, 那么生产者在成功写入消息之前不会等待任何来自服务器的响应。假如消息发送过程中出现问题，导致服务器没有收到消息，生产者也无从得知，消息也会丢失。这种模式适合高吞吐量但对可靠性没有很高要求的场景（例如埋点数据的收集）但由于不需要等待服务器的响应，所以能够达到很高的吞吐量。（注意此时 retries 参数也不会起作用）<br>如果 ack = 1, 那么只要集群的 Leader 收到消息，生产者就认为服务器成功接收消息。如果消息无法达到首领节点，那么生产者会收到错误响应并进行重试。但这里可能出现的问题是，如果一个没有收到消息的节点在旧首领下线以后成为新的首领，消息还是会丢失。<br>如果 ack = all, 当所有参与复制的节点全部收到消息时，生产者才会收到一个来自服务器的成功响应。这种模式是最为安全的，适合严格要求消息不能丢失的情况（例如支付、订单等相关消息）。然而由于要等待所有节点接收消息，所以延迟会比 ack = 1 时更高。</td></tr><tr><td>retries</td><td>决定 Producer 在临时性的错误下的重试次数，如果达到这个次数， Producer 会放弃重试并返回错误。每次重试之间的间隔，也可以通过 retry.backoff.ms 来配置。重试次数和重试间隔的乘积，应该大于一个常规的节点崩溃并回复的所需时间（例如所有分区选举出首领需要多少时间），以避免 Producer 过早放弃重试。一般情况下，我们在代码中只需要处理那些无法通过重试成功发送的错误，或重试次数超出上限以后的情况。</td></tr><tr><td>linger.ms</td><td>Producer 会在 batch 被填满或者当前 batch 已经等待了 linger.ms 时，把 batch 中的所有消息一起发送出去。对于吞吐量较小的主题，这个值会影响消息的延迟。</td></tr><tr><td>max.in.flight.requests.per.connection</td><td>指定 Producer 在收到服务器响应之前可以发送多少消息。默认为5，提高此值可以提高吞吐量。但当值大于1时，在发送出现失败后重试时，将有几率无法保证消息按照发送的顺序写入服务器。在高吞吐量且无需关心数据顺序（例如日志或埋点记录）时，可适量提高。</td></tr><tr><td>request.timeout.ms</td><td>Producer 在发送数据时等待服务器返回响应的时间</td></tr><tr><td>timeout.ms</td><td>指定 broker在等待同步副本返回消息确认的时间（与 acks 配置相互作用），如果在指定时间内没有收到同步副本的确认，broker会抛出错误。</td></tr><tr><td>enable.idempotence</td><td>设置为true时，Producer会保证消息只发送一次。设置为 False 时，重试就可能会引起多条数据发送到集群中。注意， enabling idempotence 要求 max.in.flight.requests.per.connection = 1 且 retries &gt; 0 且 acks = ‘all’</td></tr></tbody></table><h3 id="Consumer"><a href="#Consumer" class="headerlink" title="Consumer"></a>Consumer</h3><table><thead><tr><th>parameters</th><th>description</th></tr></thead><tbody><tr><td>fetch.min.wait.ms</td><td>用于描述broker的等待时间，默认是500ms。如果没有足够的数据（fetch.min.bytes)流入，那么消费者会等待该参数指定的时间，然后返回所有可用的数据</td></tr><tr><td>max.partition.fetch.bytes</td><td>指定分区返回给消费者的最大字节数，默认为1MB。 KafkaConsumer.poll() 方法从每个分区里面返回的记录，不会超过配置中返回的字节数。如果一个主题有20个分区、5个消费者，那么每个消费者需要至少4MB的可用内存来接收记录。</td></tr><tr><td>session.timeout.ms</td><td>consumer 与 coordinator 之间 session 超时时间，heartbeat request 将刷新此 timeout。如果 coordinator 发现 session 超时，将触发 rebalance</td></tr><tr><td>enable.auto.commit</td><td>如果设置为 true，那么 poll 方法中接收到的最大偏移量就会被提交。提交间隔由 auto.commit.interval.ms 控制。自动提交是在轮询里进行的，假如每次在轮询时会检查是否改提交偏移量，如果是</td></tr><tr><td>max.poll.records</td><td>控制单次轮询处理的记录数</td></tr><tr><td>max.poll.interval.ms</td><td>consumer 两次 poll 消息之间的最大间隔时间，poll request 将刷新此 timeout。如果 consumer 发现两次 poll 中间间隔时间超出此值，将主动发出 leave group 请求，该请求会使得此 consumer 离开所在的 consumer group，并触发 rebalance （注：如果 consumer 主动离开所在的 consumer group， 那么将会暂停 heartbeat request 的发送，下一次 poll 发生时会尝试重新加入 group） 同时改值也代表着 rebalance timeout，在一次 rebalance 中，如果在该时间内 consumer 未 join group，那么 consumer 将被认为离开了 group</td></tr><tr><td>heartbeat.interval.ms</td><td>consumer 两次发起 heartbeat 请求之间的间隔。该请求会刷新 session timeout 时间。如果 coordinator 发起了 rebalance，consumer 会通过该请求得知 rebalance 的发生，并发送 join group request 加入该 group</td></tr></tbody></table><h3 id="可靠性"><a href="#可靠性" class="headerlink" title="可靠性"></a>可靠性</h3><p>在讨论消息系统的可靠性时，要先考虑两个问题：</p><ul><li>是否需要保证消息的可靠发送和消费，不丢数据。</li><li>是否需要保证消息的发送和业务数据的一致性 （一般是最终一致性，不要发送错误/不一致的数据）</li></ul><p>假如 1 的答案为「是」，那么请继续阅读「Producer 可靠性 / Consumer 可靠性」章节；假如 2 中的答案为「是」，可以参考「事务型消息」章节来进行处理。</p><h4 id="Producer-可靠性"><a href="#Producer-可靠性" class="headerlink" title="Producer 可靠性"></a>Producer 可靠性</h4><p>Producer 需要处理的错误包括两部分：Producer 可以自动处理的错误 / Producer 无法自动处理，需要开发者单独处理的错误。</p><p>Producer 可以自动处理的错误，依赖于 Kafka 集群的自愈机制，例如 Leader Election / 网络问题。在这些情况下，Kafka 是可以在几秒到几十秒之间得到解决。而假如返回「INVALID_CONFIG」/ 「认证错误」之类的错误，那就不是重试可以解决的了，此时需要开发者单独处理。</p><p>大部分情况下，我们希望不丢失消息，那么最好让 Producer 在遇到可重试错误时能够保持重试，这样开发者就不需要加入额外手段去处理这些问题。</p><p>重试的过程中，Producer有一定可能向 broker 写入不止一条的消息：例如由于网络原因，消息实际已经写入，但 Broker 没有在超时时间内返回确认给 Producer ，那么 Producer 就会重试。但如下文所述，假如 Consumer 本身或业务本身可以做到消息消费的幂等性，那 Producer 就可以放心地使用内置的重试机制来重发消息。</p><h4 id="Consumer-可靠性"><a href="#Consumer-可靠性" class="headerlink" title="Consumer 可靠性"></a>Consumer 可靠性</h4><p>在服务重启的时候，Consumer 就会有一段时间无法工作，或引起 re-balance，在这个过程中 Kafka 是如何保证消息会被一定可以被消费的呢？</p><p>Consumer 设置中最重要的一个参数，就是 enable.auto.commit。对可靠性有要求的业务，强烈建议将这个偏移量设置为手动提交。并在业务代码中贯彻以下的几个思想。</p><ul><li>先处理，后提交偏移量。</li><li>若处理失败，留存消息进行补偿处理或报警（不阻塞后续消费），提交偏移量。</li><li>对消息的处理支持幂等。</li></ul><h4 id="Consumer-Rebalance"><a href="#Consumer-Rebalance" class="headerlink" title="Consumer Rebalance"></a>Consumer Rebalance</h4><p>子系统之间的交互大量的依赖于消息组件 Kafka 的支持。由此引发的一个问题是，当同一个 consumer group 中的机器集群的某一台机器宕机后，Kafka 以何种机制触发 rebalance，在多长的时间内可以将宕机 consumer 所分配的 partitions 重新分配，并使得消息可以继续得以消费。</p><p>Kafka Consumer Rebalance 过程</p><ol><li>Coordinator 通过在 HeartbeatResponse 中返回 IllegalGeneration 错误码发起 Rebalance 操作。</li><li>Consumer 发送 JoinGroupRequest</li><li>Coordinator 在 Zookeeper 中增加 Group 的 Generation ID 并将新的 Partition 分配情况写入 Zookeeper</li><li>Coordinator 发送 JoinGroupResponse</li></ol><p>综上所述，如果 Consumer宕机。我们可以假设 Consumer 与 Coordinator 网络请求耗时 = Ti，本地操作用时忽略不计，则一次 rebalance 从某个 consumer 宕机到整个 rebalance 完成的最大可能时长为：</p><p>R = session.timeout.ms + heartbeat.interval.ms + max{Ti}</p><p>在此期间，宕机的 consumer 所被分配到的 partitions 的消息无法被处理，直至 rebalance 完成后，迟滞的消息将由新的 consumer 处理。</p><p>Rebalance 相关内容参考</p><ul><li><a href="https://stackoverflow.com/questions/43991845/kafka10-1-heartbeat-interval-ms-session-timeout-ms-and-max-poll-interval-ms" target="_blank" rel="noopener">https://stackoverflow.com/questions/43991845/kafka10-1-heartbeat-interval-ms-session-timeout-ms-and-max-poll-interval-ms</a></li><li><a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread" target="_blank" rel="noopener">https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread</a></li><li><a href="http://www.infoq.com/cn/articles/kafka-analysis-part-4?utm_source=infoq&amp;utm_campaign=user_page&amp;utm_medium=link" target="_blank" rel="noopener">http://www.infoq.com/cn/articles/kafka-analysis-part-4?utm_source=infoq&amp;utm_campaign=user_page&amp;utm_medium=link</a></li><li><a href="https://github.com/apache/kafka/tree/0.11.0" target="_blank" rel="noopener">https://github.com/apache/kafka/tree/0.11.0</a></li></ul><h3 id="事务型消息"><a href="#事务型消息" class="headerlink" title="事务型消息"></a>事务型消息</h3><p>对需要进行业务解耦，要求业务状态消息更新高可靠性的系统，需要有一个能确保业务数据状态、消息发送成功一致性的消息队列，来进行业务解耦。与之对应，消息需要完成以下核心功能：</p><ul><li>Producer: 若业务状态变化了，则与之相关的对应消息最终会被发送成功。</li><li>Message Broker：acks = all</li><li>Consumer: 当消息没有被成功消费时，需要被记录并再次消费。</li></ul><p>实现 Producer 事务性的方式：</p><ul><li>【A1】将 业务改动 和 消息日志记录落库 放在同一个事务中（留痕迹后再进行发送）。</li><li>【B1/B2】消息的可靠发送：每个 MQ 实例中，会用一个 deamon 会不断地轮训一段时间内应被发送但未发送成功的消息（带重试次数）</li><li>【A2】提高消息的实时性的方法：状态和消息一起 Commit 以后，在业务代码中的 After Commit 动作中里，自行读取消息进行发送。</li></ul><p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/transactional_messages.png" alt="Transactional Messages"></p>]]></content>
    
    <summary type="html">
    
      &lt;p&gt;&lt;img src=&quot;https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/franz-kafka-en-1906.jpg&quot; alt=&quot;Kafka&quot;&gt;&lt;/p&gt;
&lt;p&gt;I collect some of Apache Kafka usage in our team and write an internal post in Chinese. And I would like to pull the fundamental part into this post.&lt;/p&gt;
    
    </summary>
    
      <category term="Engineering" scheme="http://fwz.github.io/categories/Engineering/"/>
    
      <category term="Message Queue" scheme="http://fwz.github.io/categories/Engineering/Message-Queue/"/>
    
    
      <category term="Kafka" scheme="http://fwz.github.io/tags/Kafka/"/>
    
  </entry>
  
  <entry>
    <title>A 2D Nim Game</title>
    <link href="http://fwz.github.io/2018/02/21/A-2D-Nim-Game/"/>
    <id>http://fwz.github.io/2018/02/21/A-2D-Nim-Game/</id>
    <published>2018-02-21T13:15:26.000Z</published>
    <updated>2019-05-02T17:22:51.605Z</updated>
    
    <content type="html"><![CDATA[<p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/2d-nim-1.png" alt="2D Nim"></p><h3 id="Intro"><a href="#Intro" class="headerlink" title="Intro"></a>Intro</h3><p>This article is about a Nim game I played in my primary school. The rule are simple:</p><ul><li>There are 16 stones, arranged in a 4 row * 4 column grid.</li><li>Each player take stone(s) in turn.</li><li>Once the stone is taken, there is a gap on it’s original location on the grid.</li><li>In each take, one should take no more than 3 stones (1/2/3). Only those stones in the same row or in the same column without gap between them could be taken.</li><li>Player who takes the last stone <em>LOSE</em></li></ul><p>Let me demonstrate it. Assuming <code>1</code> means there is stone on the specific cell and <code>0</code> represent a gap in the following chart. And those state that  would definitely lose are called <strong>Losing State</strong>.</p><ul><li>the initial state could be represented as (I)</li><li>the ending state could be represented as (II)</li><li>one of the losing state (LS) for current turn player could be represented as (III) </li></ul><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">1111   0000   0000</span><br><span class="line">1111   0000   0000</span><br><span class="line">1111   0000   0010</span><br><span class="line">1111   0000   0000</span><br><span class="line">(I)    (II)   (III)</span><br></pre></td></tr></table></figure><p>To be more specific, the following state is a possible game.</p><a id="more"></a><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">Org    (A)    (B)    (A)    (B)    (A)    (B)    (A)    </span><br><span class="line">1111   1110   1110   0110   0110   0110   0000   0000</span><br><span class="line">1111   1110   1110   1110   1110   1110   1110   0000</span><br><span class="line">1111 &gt; 1111 &gt; 1000 &gt; 1000 &gt; 1000 &gt; 0000 &gt; 0000 &gt; 0000</span><br><span class="line">1111   1111   1111   1111   1100   0100   0100   0100</span><br></pre></td></tr></table></figure><h3 id="Initial-analysis"><a href="#Initial-analysis" class="headerlink" title="Initial analysis"></a>Initial analysis</h3><p>According to my primary school playing experience, all of the following states are LS (and welcome to try). </p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">1100   1001   1011   1010</span><br><span class="line">1100   1001   0111   1010</span><br><span class="line">0000   1001   0000   0000</span><br><span class="line">0000   0000   0000   0000</span><br></pre></td></tr></table></figure><p>Believe it or not, knowing this states make you 99% unbeatable in primary school.</p><p>But I want to discover the essence of this game. So let’s do some deep dive in 20 years later.</p><p>Instead of staying on the number axis, such as some basic Nim game <a href="https://en.wikipedia.org/wiki/Nim#The_subtraction_game_S(1,_2,_._._.,_k" target="_blank" rel="noopener">subtraction problem</a>) , this game is in a 2D space, and there are a lot of variation in each take. The first thought come to me is that this game is about graph theory and the state analysis becoming connected graph analysis since removing stones require connectivity. After a simple estimation of my poor graph theory knowledge, I give up this direction shamelessly and try to analyse the state space of this game.</p><p>Thanks to the simple nature of this game, there are only 2^16 = 65536 states in this game. </p><p>And we know that the following 16 states are absolutely going to lose the game. If we could leave this 16 states to our opponent, then we could win the game. </p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">1000  0100  0010  0001  0000  0000  ...  0000  0000</span><br><span class="line">0000  0000  0000  0000  1000  0100  ...  0000  0000</span><br><span class="line">0000  0000  0000  0000  0000  0000  ...  0000  0000</span><br><span class="line">0000  0000  0000  0000  0000  0000  ...  0010  0001</span><br></pre></td></tr></table></figure><p>So any state that transfer to these 16 states in one move, will be our winning state. Because if we get any of those states, then we have at least one way to transfer this state to the above 16 lose states for our opponent. For example, these states are some of the winning state towards the first state in above situations.</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">1100 1110  1111  1000  1000  1000</span><br><span class="line">0000 0000  0000  1000  1000  0100</span><br><span class="line">0000 0000  0000  0000  1000  0000</span><br><span class="line">0000 0000  0000  0000  0000  0000  ...</span><br></pre></td></tr></table></figure><p>After calculate all such states, we can easily get 600+ win states.</p><p>Great, but how to proceed? Any state could transfer to a win-state in one move is lose-state? Not necessarily, check the first 2 states: <code>1100</code> and <code>1110</code>, in the above graph. <code>1110</code> can transfer to <code>1100</code> but it still win because we could choose to leave only <code>1000</code> to our opponent.</p><p>But it give us a hint for the deduction for two smart enough player,</p><blockquote><p>if <strong>ANY</strong> next state of current state <strong>LOSE</strong>, current state <strong>WIN</strong></p></blockquote><p>And with further consideration,</p><blockquote><p>Only <strong>ALL</strong> next state(s) of current state <strong>WIN</strong>, current state <strong>LOSE</strong></p></blockquote><p>Let’s do a quick test with ZERO state. If a player get this state, means his opponent take the last stones, so ZERO state is a win state. The previous 16 states have only 1 next state, the ZERO state. So they are indeed the lose state.</p><p>Our game is a typical case of zero-sum perfect-information game and it match the <a href="https://en.wikipedia.org/wiki/Zermelo%27s_theorem_(game_theory" target="_blank" rel="noopener">Zermelo’s Theorem</a>). In such a game, for any state, one side of the game will have a series of strategy that could definitely win the game. So it also mean any state in the game will be either a win-state or a lose-state. </p><p>Seems we are making progress, but we still need a way to expand the lose-state / win-state set. This game will be conquered once all 65536 states are figured out.</p><h3 id="Solution-generation"><a href="#Solution-generation" class="headerlink" title="Solution generation"></a>Solution generation</h3><p>With the support of Zermelo, and the previous deduction that:</p><blockquote><p>if <strong>ANY</strong> next state of current state <strong>LOSE</strong>, current state <strong>WIN</strong>. else, current state <strong>LOSE</strong></p></blockquote><p>we are able design the following algorithm to figure out and expand the states set from ZERO state.</p><p>First, define a binary notation for a state. For example, the following state could be converted as (1111000011111111)2 = 61695<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">1111   </span><br><span class="line">0000</span><br><span class="line">1111 </span><br><span class="line">1111</span><br></pre></td></tr></table></figure></p><p>In this notation, for any state <code>S</code> and all its next states <code>N</code>, <code>(S)2</code> &gt; <code>(N)2</code>, because we are taking one or more <code>1</code> from <code>S</code> to get <code>N</code>.</p><p>According to deduction mentioned, if the win/lose state of S’s all next states are known, then the win/lose state of S could be figure out. It’s a quite straight forward problem that could be solved by dynamic programming – After we initialize ZERO state as winning state, we could iterate the whole state space.</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line">lose_set = set()</span><br><span class="line">win_set = set()</span><br><span class="line"></span><br><span class="line"><span class="comment"># init</span></span><br><span class="line">win_set.add(<span class="number">0</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment"># start calc</span></span><br><span class="line"><span class="keyword">for</span> i <span class="keyword">in</span> range(<span class="number">1</span>, <span class="number">65536</span>):</span><br><span class="line">    flag = <span class="keyword">True</span></span><br><span class="line"></span><br><span class="line">    <span class="comment"># if all next step of current board is a win board, current board lose</span></span><br><span class="line">    <span class="keyword">for</span> board, number <span class="keyword">in</span> find_all_adj_board(num2board(i), <span class="number">1</span>):</span><br><span class="line">        <span class="keyword">if</span> number <span class="keyword">not</span> <span class="keyword">in</span> win_set:</span><br><span class="line">            flag = <span class="keyword">False</span></span><br><span class="line">            <span class="keyword">break</span></span><br><span class="line">    <span class="keyword">if</span> flag:</span><br><span class="line">        lose_set.add(i)</span><br><span class="line">    <span class="keyword">else</span>:</span><br><span class="line">        <span class="comment"># must be win or lose, according to Zermelo</span></span><br><span class="line">        win_set.add(i)</span><br></pre></td></tr></table></figure><p>The upper bound of count for next states is smaller than 4*4*2*3, because for any possible cell, one could at most remove 3 types of length for 2 directions (down / right)</p><p>To be more generalize, for an N*N Grid with above rules, this algorithm take O(2^(N*N)*2*3*N*N) to iterate all possible states. If N = 4, it takes 30 seconds to generate all states on my computer. so it might take much more for N = 5.</p><p>If you want to know whether the first player or the second player could force a win, checkout the game script I wrote on github: <a href="https://github.com/fwz/2d-nim/" target="_blank" rel="noopener">2d-nim</a> and test it!</p><h3 id="Summary"><a href="#Summary" class="headerlink" title="Summary"></a>Summary</h3><p>We have discussed the 2d nim game, and use Zermelo’s Theorem and Dynamic Programming to get the ultimate strategy for this game. Although we reached the fact of the game, I still believe that there are some topology based solutions with a lower complexity for larger board.</p>]]></content>
    
    <summary type="html">
    
      &lt;p&gt;&lt;img src=&quot;https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/2d-nim-1.png&quot; alt=&quot;2D Nim&quot;&gt;&lt;/p&gt;
&lt;h3 id=&quot;Intro&quot;&gt;&lt;a href=&quot;#Intro&quot; class=&quot;headerlink&quot; title=&quot;Intro&quot;&gt;&lt;/a&gt;Intro&lt;/h3&gt;&lt;p&gt;This article is about a Nim game I played in my primary school. The rule are simple:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;There are 16 stones, arranged in a 4 row * 4 column grid.&lt;/li&gt;
&lt;li&gt;Each player take stone(s) in turn.&lt;/li&gt;
&lt;li&gt;Once the stone is taken, there is a gap on it’s original location on the grid.&lt;/li&gt;
&lt;li&gt;In each take, one should take no more than 3 stones (1/2/3). Only those stones in the same row or in the same column without gap between them could be taken.&lt;/li&gt;
&lt;li&gt;Player who takes the last stone &lt;em&gt;LOSE&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let me demonstrate it. Assuming &lt;code&gt;1&lt;/code&gt; means there is stone on the specific cell and &lt;code&gt;0&lt;/code&gt; represent a gap in the following chart. And those state that  would definitely lose are called &lt;strong&gt;Losing State&lt;/strong&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the initial state could be represented as (I)&lt;/li&gt;
&lt;li&gt;the ending state could be represented as (II)&lt;/li&gt;
&lt;li&gt;one of the losing state (LS) for current turn player could be represented as (III) &lt;/li&gt;
&lt;/ul&gt;
&lt;figure class=&quot;highlight plain&quot;&gt;&lt;table&gt;&lt;tr&gt;&lt;td class=&quot;gutter&quot;&gt;&lt;pre&gt;&lt;span class=&quot;line&quot;&gt;1&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;2&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;3&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;4&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;5&lt;/span&gt;&lt;br&gt;&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;line&quot;&gt;1111   0000   0000&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;1111   0000   0000&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;1111   0000   0010&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;1111   0000   0000&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;(I)    (II)   (III)&lt;/span&gt;&lt;br&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/figure&gt;
&lt;p&gt;To be more specific, the following state is a possible game.&lt;/p&gt;
    
    </summary>
    
      <category term="Game" scheme="http://fwz.github.io/categories/Game/"/>
    
      <category term="Board Game" scheme="http://fwz.github.io/categories/Game/Board-Game/"/>
    
    
      <category term="Game Theory" scheme="http://fwz.github.io/tags/Game-Theory/"/>
    
  </entry>
  
  <entry>
    <title>Understanding Fstab</title>
    <link href="http://fwz.github.io/2017/06/07/Understanding-Fstab/"/>
    <id>http://fwz.github.io/2017/06/07/Understanding-Fstab/</id>
    <published>2017-06-07T14:35:56.000Z</published>
    <updated>2019-05-02T05:26:30.254Z</updated>
    
    <content type="html"><![CDATA[<p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/fstab.jpg" alt=""></p><p>Today I encounter a very strange problem that all recent deployed applications in a specific host fail to start, with a simple error message <strong>“Permission Denied. xxxx.sh could not be executed”</strong>. </p><p>Nonsense! They’ve run for a very long time with a Jenkins driven deployment, started by a deploy script. And a binary “could not be executed” might be controlled by user &amp; file access flag. Both checked, using <code>root</code> user &amp; <code>755</code> access flag.</p><p>Suddenly I remember that the hard-disk on this host have been re-mounted by Ops a few days ago, this operation might corrupt some filesystem runtime context. So I wrote a simple test script <code>a.sh</code>, place it into different mounting point <code>/tmp</code> and <code>/data</code> and try to execute them via <code>./a.sh</code> (not <code>bash ./a.sh</code> because in this case we are using <code>r</code> way instead of <code>x</code> to access the script). And script could be executed under <code>/tmp</code> but not <code>/data</code>.</p><p>Seems we are close to the root cause, but what stop us from executing a binary in different mounting point? The answer is the <code>fstab</code>.</p><a id="more"></a><blockquote><p>fstab is a configuration file that contains information of all the partitions and storage devices in your computer. The file is located under /etc, so the full path to this file is /etc/fstab. /etc/fstab contains information of where your partitions and storage devices should be mounted and how.   </p></blockquote><p>After viewing the content of <code>/etc/fstab</code>, we then know why things happen. Here are the content:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">#</span><br><span class="line"># /etc/fstab</span><br><span class="line"># Created by anaconda on Fri Aug 26 16:02:35 2016</span><br><span class="line">#</span><br><span class="line"># Accessible filesystems, by reference, are maintained under &apos;/dev/disk&apos;</span><br><span class="line"># See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info</span><br><span class="line">#</span><br><span class="line">UUID=7e452929-a3a3-4f1f-80a3-91eced90b453/xfsdefaults0 0</span><br><span class="line">UUID=b797b27d-0499-451e-a1cc-8d7fc014777c/bootxfsdefaults0 0</span><br><span class="line">UUID=a47c3334-ae2c-486f-9756-7172b5570035swapswapdefaults0 0</span><br><span class="line"></span><br><span class="line">/dev/mapper/centos-data /data xfs defaults,noexec 0 0</span><br></pre></td></tr></table></figure><p>There are 6 columns for a mount option, each represent:</p><ul><li>block special device or remote filesystem to be mounted</li><li>mount point of file system</li><li>file system type</li><li>mount option associate with the mount</li><li>dump(8) flag</li><li>boot check sequence  </li></ul><p>With help of <code>man 5 fstab</code> and <code>man 8 mount</code> we could see the <code>/data</code> mounting point is bound with a ridiculous <code>noexec</code> option. According to man page of <code>fstab(5)</code> about fourth field (fs_mntops):</p><ul><li>This field describes the mount options associated with the filesystem.   </li><li>It  is  formatted  as a comma separated list of options.  </li><li>It contains at least the type of mount plus any additional options appropriate to the filesystem type.  </li><li>For documentation  on the available mount options, see mount(8).</li><li>For documentation on the available swap options, see swapon(8).</li></ul><p>then check <code>mount(8)</code>  </p><ul><li><strong>noexec</strong>  Do not allow direct execution of any binaries on the mounted  filesystem. (Until recently it was possible to run binaries anyway using a command like /lib/ld*.so /mnt/binary. This trick fails since Linux 2.4.25 / 2.6.0.)</li></ul><p><strong>So problem resolved after we remove the “,noexec” option from the <code>/data</code> mount point.</strong> The previous statement “(Binary) could not be executed might be controlled by user &amp; file access flag.” are not accurate enough. Binary execution could also be controlled by <strong>File system mount options</strong> under “/etc/fstab”.</p><p>Now it’s time to ask why Ops assigned such a flag on this mount…</p>]]></content>
    
    <summary type="html">
    
      &lt;p&gt;&lt;img src=&quot;https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/fstab.jpg&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Today I encounter a very strange problem that all recent deployed applications in a specific host fail to start, with a simple error message &lt;strong&gt;“Permission Denied. xxxx.sh could not be executed”&lt;/strong&gt;. &lt;/p&gt;
&lt;p&gt;Nonsense! They’ve run for a very long time with a Jenkins driven deployment, started by a deploy script. And a binary “could not be executed” might be controlled by user &amp;amp; file access flag. Both checked, using &lt;code&gt;root&lt;/code&gt; user &amp;amp; &lt;code&gt;755&lt;/code&gt; access flag.&lt;/p&gt;
&lt;p&gt;Suddenly I remember that the hard-disk on this host have been re-mounted by Ops a few days ago, this operation might corrupt some filesystem runtime context. So I wrote a simple test script &lt;code&gt;a.sh&lt;/code&gt;, place it into different mounting point &lt;code&gt;/tmp&lt;/code&gt; and &lt;code&gt;/data&lt;/code&gt; and try to execute them via &lt;code&gt;./a.sh&lt;/code&gt; (not &lt;code&gt;bash ./a.sh&lt;/code&gt; because in this case we are using &lt;code&gt;r&lt;/code&gt; way instead of &lt;code&gt;x&lt;/code&gt; to access the script). And script could be executed under &lt;code&gt;/tmp&lt;/code&gt; but not &lt;code&gt;/data&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Seems we are close to the root cause, but what stop us from executing a binary in different mounting point? The answer is the &lt;code&gt;fstab&lt;/code&gt;.&lt;/p&gt;
    
    </summary>
    
      <category term="Engineering" scheme="http://fwz.github.io/categories/Engineering/"/>
    
      <category term="Dev Ops" scheme="http://fwz.github.io/categories/Engineering/Dev-Ops/"/>
    
    
      <category term="Linux" scheme="http://fwz.github.io/tags/Linux/"/>
    
  </entry>
  
  <entry>
    <title>2017 Resolution</title>
    <link href="http://fwz.github.io/2017/02/01/2017-Resolution/"/>
    <id>http://fwz.github.io/2017/02/01/2017-Resolution/</id>
    <published>2017-02-01T09:37:27.000Z</published>
    <updated>2019-05-02T17:29:22.325Z</updated>
    
    <content type="html"><![CDATA[<p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/2017-resolution.jpg" alt=""></p><p>I drafted this new year resolution on the plane. Looking forwards to a fruitful year!</p><a id="more"></a><h2 id="Professional-Skills"><a href="#Professional-Skills" class="headerlink" title="Professional Skills"></a>Professional Skills</h2><ul><li>Write at least 5 blog posts on method and principle for real world technical problem solving.</li><li>Finish a Machine Learning / Deep learning related course</li><li>Investigate 5 &amp; Master 1 Open-Source Machine Learning Libraries</li><li>Build a strong team – a team that is capable to deliver high quality product, that is self driven and motivated, that is with high reputation.</li></ul><p>Decision to work on AI was encouraged by performance of AlphaGo and Libratus in 2016. They draw an impressive picture for us and I am pretty sure that AI would be our next tipping point. I’m already late, but it’s still the best time to start now.</p><h2 id="Music"><a href="#Music" class="headerlink" title="Music"></a>Music</h2><p>Since I moved out and have more time to practise instrument playing, I make very aggressive goals on music this year.</p><ul><li>Produce 2 Songs (not necessarily original) &amp; Design the cover for them</li><li>Practice the first half of “66 drum set training”</li><li>Practice the first half of “Berkley A Modern Method for Guitar : Volume 1”</li><li>Get in touch with a new instrument and be able to play a simple melody (I guess violin or villa be a good choice, but I finally choose electric guitar and hopefully I am able to play some punk next year)</li></ul><h2 id="Fitness"><a href="#Fitness" class="headerlink" title="Fitness"></a>Fitness</h2><p>Physical happiness guarantee everything!</p><ul><li>Get Level 7 on Keep</li><li>Generate 6 packs</li><li>Swim 40000 m</li><li>Practise butterfly and be able to swim 100m without exhaustion</li><li>Capable to ski on Nanshan’s medium tracks. (Achieved)</li></ul><h2 id="Financial"><a href="#Financial" class="headerlink" title="Financial"></a>Financial</h2><ul><li>Get 100K RMB non-salary income</li></ul><h2 id="Health"><a href="#Health" class="headerlink" title="Health"></a>Health</h2><ul><li>Sleep before 12:30 am on weekday (Failed)</li></ul><h2 id="Travels"><a href="#Travels" class="headerlink" title="Travels"></a>Travels</h2><ul><li>Pay a visit to Iceland!</li></ul>]]></content>
    
    <summary type="html">
    
      &lt;p&gt;&lt;img src=&quot;https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/2017-resolution.jpg&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;I drafted this new year resolution on the plane. Looking forwards to a fruitful year!&lt;/p&gt;
    
    </summary>
    
      <category term="Career" scheme="http://fwz.github.io/categories/Career/"/>
    
      <category term="Resolution" scheme="http://fwz.github.io/categories/Career/Resolution/"/>
    
    
      <category term="2017" scheme="http://fwz.github.io/tags/2017/"/>
    
      <category term="Resolution" scheme="http://fwz.github.io/tags/Resolution/"/>
    
  </entry>
  
  <entry>
    <title>The Data Migration Problem</title>
    <link href="http://fwz.github.io/2016/08/18/The-Data-Migration-Problem/"/>
    <id>http://fwz.github.io/2016/08/18/The-Data-Migration-Problem/</id>
    <published>2016-08-18T09:37:27.000Z</published>
    <updated>2019-05-02T17:28:39.603Z</updated>
    
    <content type="html"><![CDATA[<p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/cool-jenkins2x3.png" alt="Migrate the Hotel!"><br>Today I am trying to solve a very interesting problem, I would like to call it the “data migration problem”. </p><a id="more"></a><h2 id="Background"><a href="#Background" class="headerlink" title="Background"></a>Background</h2><p>Let me illustrate to problem in short. </p><ul><li>We are working on data migration of a PMS system to a newer version. </li><li>This system contains 2 major entities: <strong>USER</strong> &amp; <strong>HOTEL</strong></li><li>Each <strong>USER</strong> could operate 1 or more <strong>HOTELS</strong></li><li>Each <strong>HOTELS</strong> could be operated by 1 or more <strong>USERS</strong></li><li>Migrations are operated by batch of hotels. Each batch we could migrate 1 or more hotels.</li></ul><h2 id="Requirement"><a href="#Requirement" class="headerlink" title="Requirement"></a>Requirement</h2><ul><li>After each batch of migration, all hotels operated by the same user should at the same status (migrated / non-migrated).</li><li>In the migration process, all hotels in the same batch could not be operate (downtime happens). So we are trying to minimize size of each batch with above requirement.</li></ul><h2 id="Examples"><a href="#Examples" class="headerlink" title="Examples"></a>Examples</h2><p>Each input line implies a relation of operation / management<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">input:</span><br><span class="line">    user_1   hotel_1</span><br><span class="line">    user_2   hotel_2</span><br><span class="line">    user_3   hotel_3</span><br><span class="line"></span><br><span class="line">output:</span><br><span class="line">    [hotel_1]</span><br><span class="line">    [hotel_2]</span><br><span class="line">    [hotel_3]</span><br></pre></td></tr></table></figure></p><p>Each user operate individual hotels, so each hotel could be migrated in separate batch.</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">input:</span><br><span class="line">    user_1   hotel_1</span><br><span class="line">    user_2   hotel_1</span><br><span class="line">    user_2   hotel_2</span><br><span class="line">    user_3   hotel_2</span><br><span class="line">    user_3   hotel_3</span><br><span class="line"></span><br><span class="line">output:</span><br><span class="line">    [hotel_1, hotel_2, hotel_3]</span><br></pre></td></tr></table></figure><p>All hotels should be migrate in the same batch. Once hotel_1 is migrated, all other hotels user_2 operates (here, hotel_2) should be migrated as well because user_2 also operate hotel_1. Similarly, once hotel_2 migrated, hotel_3 should also be migrated (in the same batch) because user_3 operate these two hotels.</p><h2 id="Solutions"><a href="#Solutions" class="headerlink" title="Solutions"></a>Solutions</h2><p>At the first glance, The data structure are very similar to <a href="https://en.wikipedia.org/wiki/Bipartite_graph" target="_blank" rel="noopener">BIPARTITE GRAPH</a>. The nodes consist of two part are the hotels and the users, edges are relations between hotels and users. There no edges between hotels, neither do users.</p><p><img src="http://7ktqal.com1.z0.glb.clouddn.com/img/blog/Simple-bipartite-graph.png" alt="Bipartite graph"></p><p>However, we are not trying to apply capable users to manage hotel as much as possible. So it’s not a classical bipartite graph <a href="https://en.wikipedia.org/wiki/Matching_%28graph_theory%29" target="_blank" rel="noopener">MATCH PROBLEM</a>. Instead, we are trying to judge whether nodes are connected via edges, which could be convert to another problem – “Finding all <a href="https://en.wikipedia.org/wiki/Connected_component_%28graph_theory%29" target="_blank" rel="noopener">CONNECTED COMPONENT</a>“. </p><p><img src="http://7ktqal.com1.z0.glb.clouddn.com/img/blog/Pseudoforest.svg.png" alt="Connected Components"></p><p>Of course we have BFS/DFS to answer such questions. However, some information is not important to our current problem – the structure. We don’t need to know which manager make ‘Ritz Carlton’ and ‘Hilton’ connected, we just need to make sure both hotel are in the same batch. So we are able to make our algorithm more effective using <a href="https://en.wikipedia.org/wiki/Disjoint-set_data_structure" target="_blank" rel="noopener">DISJOINT SET</a>.</p><h2 id="Explanations"><a href="#Explanations" class="headerlink" title="Explanations"></a>Explanations</h2><p>Let’s see the <code>Elem</code> class. Actually it’s an implementation of the Disjoint Set.</p><ul><li>When initialize, Each <code>Elem</code> could be seen as an single element set.</li><li><code>parent</code> is a pointer to its parent <code>Elem</code>.</li><li>To judge whether two element is in the same set, check whether their <code>root</code> (utmost parent / ancestor) are the same element</li><li>To <code>union</code> two set, we find the <code>root</code> of each set, set parent of one root to the other root</li></ul><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">root</span><span class="params">(self)</span>:</span></span><br><span class="line">    <span class="keyword">if</span> self.parent == self:</span><br><span class="line">        <span class="keyword">return</span> self</span><br><span class="line">    <span class="keyword">else</span>:</span><br><span class="line">        <span class="comment"># recursively find the ancestor</span></span><br><span class="line">        <span class="keyword">return</span> self.parent.root() </span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">union</span><span class="params">(self, y)</span>:</span></span><br><span class="line">    my_root = self.root()</span><br><span class="line">    my_root.parent = y.root()</span><br></pre></td></tr></table></figure><p>And the iteration algorithm is:</p><ol><li>Each user / hotels are initialized as a individual set.</li><li>Iterate all hotels, for each hotel:<ul><li>iterate each user it belongs to:<ul><li>If the user has not been union into any hotel, then union with current hotel</li><li>Else, the user has been union into an existing hotel set. In this situation, current hotel should also be merged into the same hotel.</li></ul></li></ul></li></ol><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> h <span class="keyword">in</span> hotels:</span><br><span class="line">    <span class="keyword">for</span> u <span class="keyword">in</span> hotels[h]:</span><br><span class="line">        <span class="keyword">if</span> up[u].parent == up[u]:</span><br><span class="line">            up[u].parent = hp[h]</span><br><span class="line">        <span class="keyword">else</span>:</span><br><span class="line">            up[u].union(hp[h])</span><br></pre></td></tr></table></figure><p>The <a href="https://en.wikipedia.org/wiki/Loop_invariant" target="_blank" rel="noopener">Loop Invariant</a> of this algorithm guarantee that:</p><ol><li>After each hotel looping, the root of any hotel Elem is always another hotel or itself. This means it’s correctly unitied with all those hotels known and supposed to be migrated together, if any, in the same batch.</li><li>After each user looping for any hotel, the root of the user Elem is always a hotel. </li></ol><p>That’s it. </p><h2 id="Code"><a href="#Code" class="headerlink" title="Code"></a>Code</h2><p>Here I feed this python script with a comma separated csv file, marking users and hotels.</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> sys</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">Elem</span><span class="params">()</span>:</span></span><br><span class="line">    parent = <span class="keyword">None</span></span><br><span class="line">    value = <span class="keyword">None</span></span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, value)</span>:</span></span><br><span class="line">        self.value = value</span><br><span class="line">        self.parent = self</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">root</span><span class="params">(self)</span>:</span></span><br><span class="line">        <span class="keyword">if</span> self.parent == self:</span><br><span class="line">            <span class="keyword">return</span> self</span><br><span class="line">        <span class="keyword">else</span>:</span><br><span class="line">            <span class="keyword">return</span> self.parent.root()</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">union</span><span class="params">(self, y)</span>:</span></span><br><span class="line">        my_root = self.root()</span><br><span class="line">        my_root.parent = y.root()</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__str__</span><span class="params">(self)</span>:</span></span><br><span class="line">        <span class="keyword">return</span> <span class="string">"[value: %s, parent: %s]"</span> % (self.value, self.root().value)</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">"__main__"</span>:</span><br><span class="line">    users = set()</span><br><span class="line">    hotels = &#123;&#125;</span><br><span class="line">    hp = &#123;&#125;</span><br><span class="line">    up = &#123;&#125;</span><br><span class="line">    res = &#123;&#125;</span><br><span class="line"></span><br><span class="line">    <span class="keyword">for</span> line <span class="keyword">in</span> sys.stdin:</span><br><span class="line">        (u, h) = line.strip().split(<span class="string">","</span>)</span><br><span class="line"></span><br><span class="line">        <span class="keyword">if</span> u <span class="keyword">not</span> <span class="keyword">in</span> users:</span><br><span class="line">            users.add(u)</span><br><span class="line"></span><br><span class="line">        <span class="keyword">if</span> h <span class="keyword">in</span> hotels:</span><br><span class="line">            hotels[h].add(u)</span><br><span class="line">        <span class="keyword">else</span>:</span><br><span class="line">            hotels[h] = set([u])</span><br><span class="line"></span><br><span class="line">    <span class="comment"># init hotel_sets </span></span><br><span class="line">    <span class="keyword">for</span> h <span class="keyword">in</span> hotels:</span><br><span class="line">        hp[h] = Elem(h) </span><br><span class="line"></span><br><span class="line">    <span class="keyword">for</span> u <span class="keyword">in</span> users:</span><br><span class="line">        up[u] = Elem(u)</span><br><span class="line"></span><br><span class="line">    <span class="comment"># start partitions</span></span><br><span class="line">    <span class="keyword">for</span> h <span class="keyword">in</span> hotels:</span><br><span class="line">        <span class="keyword">for</span> u <span class="keyword">in</span> hotels[h]:</span><br><span class="line">            <span class="keyword">if</span> up[u].parent == up[u]:</span><br><span class="line">                up[u].parent = hp[h]</span><br><span class="line">            <span class="keyword">else</span>:</span><br><span class="line">                up[u].union(hp[h])</span><br><span class="line"></span><br><span class="line">    <span class="keyword">for</span> h <span class="keyword">in</span> hp:</span><br><span class="line">        root = hp[h].root().value </span><br><span class="line">        <span class="keyword">if</span> root <span class="keyword">in</span> res:</span><br><span class="line">            res[root].append(h)</span><br><span class="line">        <span class="keyword">else</span>:</span><br><span class="line">            res[root] = [h]</span><br><span class="line"></span><br><span class="line">    <span class="keyword">for</span> r <span class="keyword">in</span> res:</span><br><span class="line">        <span class="keyword">print</span> res[r]</span><br></pre></td></tr></table></figure>]]></content>
    
    <summary type="html">
    
      &lt;p&gt;&lt;img src=&quot;https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/cool-jenkins2x3.png&quot; alt=&quot;Migrate the Hotel!&quot;&gt;&lt;br&gt;Today I am trying to solve a very interesting problem, I would like to call it the “data migration problem”. &lt;/p&gt;
    
    </summary>
    
      <category term="Engineering" scheme="http://fwz.github.io/categories/Engineering/"/>
    
      <category term="Algorithm" scheme="http://fwz.github.io/categories/Engineering/Algorithm/"/>
    
    
      <category term="Algorithm" scheme="http://fwz.github.io/tags/Algorithm/"/>
    
  </entry>
  
  <entry>
    <title>Soft Skills Programmers Need</title>
    <link href="http://fwz.github.io/2016/07/21/Soft-Skills-Programmers-Need/"/>
    <id>http://fwz.github.io/2016/07/21/Soft-Skills-Programmers-Need/</id>
    <published>2016-07-21T02:00:26.000Z</published>
    <updated>2019-05-02T05:25:48.849Z</updated>
    
    <content type="html"><![CDATA[<p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/soft_skills.png" alt="Soft Skills"></p><h1 id="程序员都需要哪些软技能"><a href="#程序员都需要哪些软技能" class="headerlink" title="程序员都需要哪些软技能"></a>程序员都需要哪些软技能</h1><p>在某问答网站看到了一个非常很有意思的问题：程序员都需要哪些软技能？前段时间正好看完了一本神书，稍微归纳出一些不成熟的想法：</p><ul><li>持续学习意愿</li><li>口头表达和写作能力</li><li>建立合理预期的能力</li><li>管理事务复杂度的能力</li><li>个人品牌建设的能力</li><li>财务管理能力</li></ul><p>下面一一叙述。</p><a id="more"></a><h3 id="持续学习意愿"><a href="#持续学习意愿" class="headerlink" title="持续学习意愿"></a>持续学习意愿</h3><p>其实这都不能算软技能了，明明就是吃饭的家伙。持续学习的意愿低的话，自身的进步会非常有限。</p><p>举个最普遍的例子：小明在工作中遇到个新问题</p><p>假如小明自学能力和意愿较低，会更倾向用旧知识解决工作中遇到的新问题。特定的知识一般适合解决特定的问题，因此用已有的知识费力地完成，不一定是最优解。做完项目，没有什么成长，心里有气——怎么这些破事都忘我头上堆。</p><p>假如小明自学意愿强烈，可能就会考虑这个问题是否有别人遇到过，是否有成熟方法或者框架解决。有的话调研一下，看看能不能借鉴或者复用。做完项目，时间花得少，技能点还多点了一堆。</p><p>两种不同的习惯经过几年积攒下来，差别甚大。<br>一个人是实打实的n年工作经验，一个人是1年工作经验 + n-1年重复劳动。</p><h3 id="口头表达和写作能力"><a href="#口头表达和写作能力" class="headerlink" title="口头表达和写作能力"></a>口头表达和写作能力</h3><p>简单来说，表达就是要把一件事情说明白。能力高低基本取决于两个因素：思维能力和表述思维结论的能力。</p><p>思维能力帮助我们把事情的关系在脑中理顺理清，让我们得到对事物的总体印象。思维能力强的人可以滔滔不绝，能力差的人可能就没什么能说的。所以我平时也会观察哪些人的语速快一些，这些人很可能是比较聪明的人。为什么？脑子得跟得上嘴啊。</p><p>表述思维的能力帮助我们把结论尽可能完整地传达给受众。除了要换位思考考虑对方的背景和接受能力以外，一个很重要的方法是少用代词，将代词都具体化，这样理解的成本会变低，受众可以更顺畅地接收信息并作出反馈。</p><p>任意一个能力有短板，都会在开口的时候暴露。例如跟产品同学讨(si)论(bi)出现你在github上提issue，别人压根听不懂，总结一个事情逻辑乱七八糟。最后落下一个「沟通能力需要提高的名声」。</p><p>想方设法去进行技术分享对提高表达和写作能力大有裨益。</p><p>插个题外话：关于「什么是最重要的语言」这个问题，一直有一个无可争议的解：英语。至少在程序员界。</p><h3 id="建立合理预期的能力"><a href="#建立合理预期的能力" class="headerlink" title="建立合理预期的能力"></a>建立合理预期的能力</h3><p>这里的预期并非特指别人的预期，还包含对自己的预期。</p><p>他人的预期是个非常神秘的意象，结合我们谜一般的自尊心，非常能激发程序员的斗志，但有极可能把自己拉进一个又一个大坑（逾期交付、深夜加班）里。这种坑除了能够影响自己的信誉，影响其他人的进度安排以外，并不会带来任何好处。因此虽然我们推崇「Fake it till you make it」，但并不是说任何乱七八糟的任务我们都应该笑着扛下来然后玩命做出来。</p><p>以上的预期都需要时间慢慢树立和校正。即使是资深的程序员，更多的时候都容易高估自己。因此一个推荐方法是建立监控，每天早上看看预期需要做哪些事情，然后看看每天做了多少事情，是不是按照你的预估来的，慢慢就会得到一个对自己能力的合理认知。</p><h3 id="管理事务复杂度的能力"><a href="#管理事务复杂度的能力" class="headerlink" title="管理事务复杂度的能力"></a>管理事务复杂度的能力</h3><p>程序员的一项核心工作就是降低复杂度，这样才能将更多的精力放在其他有意义的事情上。</p><p>有三个简单的技巧可以协助程序员管理身边事务的复杂度：自动化、一致性、优先级。</p><p>自动化：其实这个能力对于程序员来讲是最容易的。有任务需要做重复3次或以上，就应该把他自动化起来。举几个耳熟能详的例子：登录远程主机是不是都配好了SSH做免密码登录？线上系统的严重错误是自己通过监控推送报警还是每天检查一遍？每天老板要看的统计数据是每天手工查一遍拼一封邮件还是到点自动查询自动发送？这些事情每一件都只需要花很少的时间，但是假如每天都要花时间在这些琐碎的任务上，很难得到一个流畅的工作状态。</p><p>一致性：假如对于某项工作，我们的处理方式始终如一，我们就更容易享受这种一致性带来的红利，举几个例子：</p><ol><li>以前写python都用vim手写，那个年代youcompleteme这种补全神器还没出来，而且函数名称写错也不会有提示，因此我经常会纠结这个函数名我之前是不是定义成这样的。还要翻来覆去的看好几遍。后来好好看了一下编程风格，所有函数的取名都根据编程风格来，这种情况下，我可以保证两次命名同一个函数会命名成同样的，调用的时候也是如此。</li><li>还是命令行，做git diff的时候，总是把基准放在左边，要提交的放在右边。有些时候其他同学在主持code review，两者的顺序时不时就颠倒一下，看起来就容易心烦意乱了。</li></ol><p>优先级：一大堆事情过来了，除了老板吩咐的高优先级的事情，我该先做哪些事情？知乎上的采铜大大有过一个非常精辟的回答：做收益半衰期高的事情。简而言之就是做有长期正面影响的事情。例如花半个小时打上一局游戏，能在短期内满足我们娱乐的欲望，但难对我们有长远正面的影响。假如能够意识到这一点并切实执行，说不定假如花半小时去提高职业相关技能，可能就能受用很久。</p><h3 id="个人品牌建设的能力"><a href="#个人品牌建设的能力" class="headerlink" title="个人品牌建设的能力"></a>个人品牌建设的能力</h3><p>个人品牌建设是一件不会在短期内能看到明显收益的事情，但从长远看来，比很多其他东西都重要（收益半衰期）。适当地营销自己，会默默地改变你的曝光率和知名度，更容易和他人取得合作的机会。</p><p>例如以下的事情可以抽空做：</p><ul><li>持续地写博客文章</li><li>总结好自己的技术能力和经历</li><li>录制技术教程</li><li>公开演讲</li></ul><p>不要太担心质量，文章会越写越好，前提是你一直在写，而且有经过自己思考的内容。</p><p>有了一点积累以后甚至可以更进一步，写书。现在也已经不是传统的年代了，只要内容有料，很容易就能够成书并散播出去。<a href="https://www.gitbook.com/" target="_blank" rel="noopener">GitBook</a> 就是一个非常好的平台。</p><h3 id="财务知识和识别忽悠"><a href="#财务知识和识别忽悠" class="headerlink" title="财务知识和识别忽悠"></a>财务知识和识别忽悠</h3><p>虽然程序员的收入也不少，但是人生阅历和社会经验，比起老板和CEO，都是鸡毛蒜皮。一不小心就会掉进哪些以为保护着自己的陷阱之中，所以学一点财务知识也非常重要。这里有两个可能很多人搞不清的概念：期权和VIE。</p><h3 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h3><p>程序员需要的软技能比大部分职业都多太多了……</p>]]></content>
    
    <summary type="html">
    
      &lt;p&gt;&lt;img src=&quot;https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/soft_skills.png&quot; alt=&quot;Soft Skills&quot;&gt;&lt;/p&gt;
&lt;h1 id=&quot;程序员都需要哪些软技能&quot;&gt;&lt;a href=&quot;#程序员都需要哪些软技能&quot; class=&quot;headerlink&quot; title=&quot;程序员都需要哪些软技能&quot;&gt;&lt;/a&gt;程序员都需要哪些软技能&lt;/h1&gt;&lt;p&gt;在某问答网站看到了一个非常很有意思的问题：程序员都需要哪些软技能？前段时间正好看完了一本神书，稍微归纳出一些不成熟的想法：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;持续学习意愿&lt;/li&gt;
&lt;li&gt;口头表达和写作能力&lt;/li&gt;
&lt;li&gt;建立合理预期的能力&lt;/li&gt;
&lt;li&gt;管理事务复杂度的能力&lt;/li&gt;
&lt;li&gt;个人品牌建设的能力&lt;/li&gt;
&lt;li&gt;财务管理能力&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;下面一一叙述。&lt;/p&gt;
    
    </summary>
    
      <category term="Career" scheme="http://fwz.github.io/categories/Career/"/>
    
      <category term="Growth" scheme="http://fwz.github.io/categories/Career/Growth/"/>
    
    
      <category term="Thinking" scheme="http://fwz.github.io/tags/Thinking/"/>
    
  </entry>
  
  <entry>
    <title>Getting Started With Grafana</title>
    <link href="http://fwz.github.io/2016/07/20/Getting-Started-With-Grafana/"/>
    <id>http://fwz.github.io/2016/07/20/Getting-Started-With-Grafana/</id>
    <published>2016-07-20T12:49:51.000Z</published>
    <updated>2019-05-02T17:31:59.436Z</updated>
    
    <content type="html"><![CDATA[<p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/grafana-demo2.png" alt="Grafana"></p><h1 id="Preface"><a href="#Preface" class="headerlink" title="Preface"></a>Preface</h1><p>When I first saw Grafana, I was astonished by its beauty immediately and I believe it should be the very tools for most dashboard / monitor use cases. This 10,000+ stared project provide a complete solution for metrics and analytics.</p><p>Check this <a href="http://grafana.org/blog/2016/05/11/grafana-3-0-stable-released.html" target="_blank" rel="noopener">live demo</a> and you should feel the same pleasure as I do.</p><h1 id="How-does-Grafana-work"><a href="#How-does-Grafana-work" class="headerlink" title="How does Grafana work"></a>How does Grafana work</h1><p>In short, Grafana is a metric solution which include UI Render / Query / Data Source features.<br>UI Render is amazing; Different types of Queries are supported and it provide a wide range of Data source (mostly time series DB) support. </p><p>So, to setup a Grafana for our own use, what should we do? </p><ul><li>First we need to setup the data source. </li><li>Second, we setup Grafana server and connect it with the data source. </li><li>Thrid, we feed data into the data source</li></ul><p>That’s it! </p><h1 id="Setting-up"><a href="#Setting-up" class="headerlink" title="Setting up"></a>Setting up</h1><p>The following article will cover how to set all the stuff up under Mac OSX. Although we have homebrew and pip, but it take more effort than it seems to be. </p><h2 id="Graphite-as-Data-Source"><a href="#Graphite-as-Data-Source" class="headerlink" title="Graphite as Data Source"></a>Graphite as Data Source</h2><p>First of all, we are going to choose the data source first. Grafana provide a lot of DB support including: <a href="http://docs.grafana.org/datasources/graphite/" target="_blank" rel="noopener">Graphite</a>, <a href="http://docs.grafana.org/datasources/elasticsearch/" target="_blank" rel="noopener">Elasticsearch</a>, <a href="http://docs.grafana.org/datasources/cloudwatch/" target="_blank" rel="noopener">CloudWatch</a>, <a href="http://docs.grafana.org/datasources/influxdb/" target="_blank" rel="noopener">InfluxDB</a>, <a href="http://docs.grafana.org/datasources/opentsdb/" target="_blank" rel="noopener">OpenTSDB</a>, <a href="http://docs.grafana.org/datasources/kairosdb" target="_blank" rel="noopener">KairosDB</a>, <a href="http://docs.grafana.org/datasources/prometheus" target="_blank" rel="noopener">Prometheus</a>,</p><p>I will take <strong>Graphite</strong> as our primary data source, with the following reasons:</p><ul><li>powerful data APIs </li><li>friendly render APIs with image accessing</li><li>using whisper file to store data, operation friendly</li><li>overall the design of Graphite is very clean, every layer of the design is scalable.</li></ul><a id="more"></a><h2 id="Setting-up-Graphite"><a href="#Setting-up-Graphite" class="headerlink" title="Setting up Graphite"></a>Setting up Graphite</h2><p>To setup Graphite, please refer to <a href="http://graphite.readthedocs.io/en/latest/install.html" target="_blank" rel="noopener">the installation doc</a> but some of them are not quite straight forward, so here is the steps I used to install everything. Before we getting start, make sure you have Homebrew and pip well-setup.</p><h3 id="Install-all-dependencies"><a href="#Install-all-dependencies" class="headerlink" title="Install all dependencies"></a>Install all dependencies</h3><p>Cairo and Django is needed to render on Graphites’ own. And we should use a specific version of Cairo since the latest 14.x version make fonts on graphite web very huge.</p><figure class="highlight"><pre><font face="monospace">cd /usr/local/Library/git checkout 7073788 /usr/local/Library/Formula/cairo.rbbrew install cairobrew install py2cairosudo pip install cairocffipip install Django==1.8pip install django-tagging</font></pre></figure><h3 id="Install-Graphite-with-pip"><a href="#Install-Graphite-with-pip" class="headerlink" title="Install Graphite with pip"></a>Install Graphite with pip</h3><figure class="highlight"><pre><font face="monospace">pip install <a href="https://github.com/graphite-project/ceres/tarball/master" target="_blank" rel="noopener">https://github.com/graphite-project/ceres/tarball/master</a>pip install whisperpip install carbonpip install graphite-web</font></pre></figure><p>it’s also recommended change the owner of Graphite directory if graphite is installed by root via<br><figure class="highlight"><pre><font face="monospace">sudo chown -R &lt;your username&gt; /opt/graphite</font></pre></figure></p><h4 id="Configure-Graphite"><a href="#Configure-Graphite" class="headerlink" title="Configure Graphite"></a>Configure Graphite</h4><p>using the following commands make all default setting works.</p><figure class="highlight"><pre><font face="monospace">cd /opt/graphitecp conf/carbon.conf{.example,}cp conf/storage-schemas.conf{.example,}cd webapp/graphite# Modify this file to change database backend (default is sqlite).cp local_settings.py{.example,}python manage.py syncdb</font></pre></figure><h3 id="Launch-Carbon-amp-Graphite"><a href="#Launch-Carbon-amp-Graphite" class="headerlink" title="Launch Carbon &amp; Graphite"></a>Launch Carbon &amp; Graphite</h3><figure class="highlight"><pre><font face="monospace">python /opt/graphite/bin/carbon-cache.py startpython /opt/graphite/bin/run-graphite-devel-server.py /opt/graphite</font></pre></figure><p>(We could just ignore this error: WHISPER_FALLOCATE_CREATE is enabled but linking failed.)</p><figure class="highlight"><pre><font face="monospace">Running Graphite from /opt/graphite under django development server/usr/local/bin/django-admin runserver --pythonpath /opt/graphite/webapp --settings graphite.settings 0.0.0.0:8080erforming system checks...System check identified no issues (0 silenced).July 20, 2016 - 10:05:58Django version 1.8, using settings &apos;graphite.settings&apos;Starting development server at <a href="http://0.0.0.0:8080/" target="_blank" rel="noopener">http://0.0.0.0:8080/</a>Quit the server with CONTROL-C.</font></pre></figure><p>If everything goes well, we should see something like this and see Graphite runs well without broken image. </p><p>Once you see the broken Image icon, go to <a href="http://0.0.0.0:8080/dashboard/" target="_blank" rel="noopener">http://0.0.0.0:8080/dashboard/</a> and you should see some python stack trace. My issue is that cairocffi is not correctly installed. Fix it till the web page doesn’t complain any more.</p><h3 id="Feeding-data-into-data-source"><a href="#Feeding-data-into-data-source" class="headerlink" title="Feeding data into data source"></a>Feeding data into data source</h3><p>Now you might be curious, what’s “carbon”? Carbon is a daemon listen for time-series data and can accept it over a common set of protocols. <code>carbon-cache.py</code> accepts metrics over various protocols and writes them to disk as efficiently as possible. This requires caching metric values in RAM as they are received, and flushing them to disk on an interval using the underlying whisper library. It also provides a query service for in-memory metric data points, used by the Graphite webapp to retrieve “hot data”.</p><p>So following the documentation, we now use the “plaintext” protocal to send some data to the data sources every 60 seconds via the following commands</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">watch -n 60 -d &apos;echo &quot;local.random.diceroll `jot -r 1 1 6` `date +%s`&quot; | nc -c 127.0.0.1 2003&apos;</span><br></pre></td></tr></table></figure><p>basically it generate a random integer between 1 and 6 for metric “local.random.diceroll” with a timestamp, send it to the data source which listen port 2003 on local host every 60 seconds. you could dig deeper by pasting the command to <a href="explainshell.com">explainshell.com</a> if you don’t quite understand how it works.</p><p>Now if we check the log via<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">tailf /opt/graphite/storage/log/carbon-cache/&lt;instance name&gt;/console.log</span><br></pre></td></tr></table></figure></p><p>You should see something like:<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">20/07/2016 21:58:36 :: Sorted 1 cache queues in 0.000091 seconds</span><br></pre></td></tr></table></figure></p><p>which mean carbon have successfully receive the data </p><p>Then, go to the graphite webapp on localhost:8080, there should be another new node under “Tree” Tab, “Metrics” -&gt; “local” -&gt; “random” -&gt; “diceroll”</p><p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/grafana-graphite.png" alt="Graphite with data feed"></p><p>Bingo! Now only one step to visualize it on Grafana</p><p>Caution, There is one config file <a href="http://graphite.readthedocs.io/en/latest/config-carbon.html#storage-schemas-conf" target="_blank" rel="noopener">storage-schema</a> you should pay attention to. With default settings, one could only feed data with a timestamp which is less than 24 hours because it will match the pattern here. </p><figure class="highlight"><pre><font face="monospace">[default_1min_for_1day]pattern = .*retentions = 60s:1d</font></pre></figure><p>As the manual suggest, ‘The first pattern that matches the metric name is used’, so the new sections for your own data should be place on top of the ‘default_1min_for_1day’ section. For example, if all of my metrics are started with ‘koro’, then the config file will be like:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">[koro]</span><br><span class="line">pattern = ^koro.*</span><br><span class="line">retentions = 60s:1y</span><br><span class="line"></span><br><span class="line">[default_1min_for_1day]</span><br><span class="line">pattern = .*</span><br><span class="line">retentions = 60s:1d</span><br></pre></td></tr></table></figure><h3 id="Grafana"><a href="#Grafana" class="headerlink" title="Grafana"></a>Grafana</h3><p>Now instal Grafana via Homebrew and find where Grafana is installed,</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">$ brew install grafana</span><br><span class="line">$ brew list grafana | head -1</span><br><span class="line">/usr/local/Cellar/grafana/3.0.1/bin/grafana-cli</span><br></pre></td></tr></table></figure><p>It’s suggested to check the Grafana Doc about configurations:<br><a href="http://docs.grafana.org/installation/configuration/" target="_blank" rel="noopener">http://docs.grafana.org/installation/configuration/</a><br><a href="https://www.linode.com/docs/uptime/monitoring/deploy-graphite-with-grafana-on-ubuntu-14-04" target="_blank" rel="noopener">https://www.linode.com/docs/uptime/monitoring/deploy-graphite-with-grafana-on-ubuntu-14-04</a></p><p>but it’s also ok to use the default settings.</p><h3 id="Starting-Grafana-Server"><a href="#Starting-Grafana-Server" class="headerlink" title="Starting Grafana Server"></a>Starting Grafana Server</h3><p>Well, it seems there is a bug on 3.0.1 so starting Grafana from command line is tricky. we should go to the <code>./share/grafana</code> to run following command:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">/usr/local/Cellar/grafana/3.0.1/share/grafana (master)</span><br><span class="line">$ ../../bin/grafana-server</span><br></pre></td></tr></table></figure><p>Or else You could get error like:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">[log.go:84 Fatal()] [E] Failed to parse defaults.ini, open /usr/local/Cellar/grafana/3.0.1/conf/defaults.ini: no such file or directory</span><br></pre></td></tr></table></figure><p>See <a href="https://github.com/grafana/grafana/issues/4531" target="_blank" rel="noopener">issue #4531</a> for more information.</p><p>then goto localhost:3000 you should see the login UI, use the default admin user/password pair in the defaults.ini to login.</p><h3 id="Connect-Grafana-with-Graphite"><a href="#Connect-Grafana-with-Graphite" class="headerlink" title="Connect Grafana with Graphite"></a>Connect Grafana with Graphite</h3><p>after login, we could navigate from ‘top left icon’ -&gt; ‘data sources’ to configure which data source grafana could use (or we could use <a href="http://localhost:3000/datasources/edit/1" target="_blank" rel="noopener">http://localhost:3000/datasources/edit/1</a> directly). </p><p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/grafana-datasource.png" alt="Configure Data source"></p><p>And now we could setup a new dashboard!<br>‘top left icon’ -&gt; ‘Dashboards’ -&gt; ‘New’ to add a new graph<br>Then go to the ‘Graph’ section of current Dashboard, go to the metrics tab, add “local”|”random”|”diceroll” as metrics name, and you should see the data points pops into the panel immediately.</p><p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/grafana-test-dashboard.png" alt="Test Dashboard"></p>]]></content>
    
    <summary type="html">
    
      &lt;p&gt;&lt;img src=&quot;https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/grafana-demo2.png&quot; alt=&quot;Grafana&quot;&gt;&lt;/p&gt;
&lt;h1 id=&quot;Preface&quot;&gt;&lt;a href=&quot;#Preface&quot; class=&quot;headerlink&quot; title=&quot;Preface&quot;&gt;&lt;/a&gt;Preface&lt;/h1&gt;&lt;p&gt;When I first saw Grafana, I was astonished by its beauty immediately and I believe it should be the very tools for most dashboard / monitor use cases. This 10,000+ stared project provide a complete solution for metrics and analytics.&lt;/p&gt;
&lt;p&gt;Check this &lt;a href=&quot;http://grafana.org/blog/2016/05/11/grafana-3-0-stable-released.html&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;live demo&lt;/a&gt; and you should feel the same pleasure as I do.&lt;/p&gt;
&lt;h1 id=&quot;How-does-Grafana-work&quot;&gt;&lt;a href=&quot;#How-does-Grafana-work&quot; class=&quot;headerlink&quot; title=&quot;How does Grafana work&quot;&gt;&lt;/a&gt;How does Grafana work&lt;/h1&gt;&lt;p&gt;In short, Grafana is a metric solution which include UI Render / Query / Data Source features.&lt;br&gt;UI Render is amazing; Different types of Queries are supported and it provide a wide range of Data source (mostly time series DB) support. &lt;/p&gt;
&lt;p&gt;So, to setup a Grafana for our own use, what should we do? &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;First we need to setup the data source. &lt;/li&gt;
&lt;li&gt;Second, we setup Grafana server and connect it with the data source. &lt;/li&gt;
&lt;li&gt;Thrid, we feed data into the data source&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That’s it! &lt;/p&gt;
&lt;h1 id=&quot;Setting-up&quot;&gt;&lt;a href=&quot;#Setting-up&quot; class=&quot;headerlink&quot; title=&quot;Setting up&quot;&gt;&lt;/a&gt;Setting up&lt;/h1&gt;&lt;p&gt;The following article will cover how to set all the stuff up under Mac OSX. Although we have homebrew and pip, but it take more effort than it seems to be. &lt;/p&gt;
&lt;h2 id=&quot;Graphite-as-Data-Source&quot;&gt;&lt;a href=&quot;#Graphite-as-Data-Source&quot; class=&quot;headerlink&quot; title=&quot;Graphite as Data Source&quot;&gt;&lt;/a&gt;Graphite as Data Source&lt;/h2&gt;&lt;p&gt;First of all, we are going to choose the data source first. Grafana provide a lot of DB support including: &lt;a href=&quot;http://docs.grafana.org/datasources/graphite/&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;Graphite&lt;/a&gt;, &lt;a href=&quot;http://docs.grafana.org/datasources/elasticsearch/&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;Elasticsearch&lt;/a&gt;, &lt;a href=&quot;http://docs.grafana.org/datasources/cloudwatch/&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;CloudWatch&lt;/a&gt;, &lt;a href=&quot;http://docs.grafana.org/datasources/influxdb/&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;InfluxDB&lt;/a&gt;, &lt;a href=&quot;http://docs.grafana.org/datasources/opentsdb/&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;OpenTSDB&lt;/a&gt;, &lt;a href=&quot;http://docs.grafana.org/datasources/kairosdb&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;KairosDB&lt;/a&gt;, &lt;a href=&quot;http://docs.grafana.org/datasources/prometheus&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;Prometheus&lt;/a&gt;,&lt;/p&gt;
&lt;p&gt;I will take &lt;strong&gt;Graphite&lt;/strong&gt; as our primary data source, with the following reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;powerful data APIs &lt;/li&gt;
&lt;li&gt;friendly render APIs with image accessing&lt;/li&gt;
&lt;li&gt;using whisper file to store data, operation friendly&lt;/li&gt;
&lt;li&gt;overall the design of Graphite is very clean, every layer of the design is scalable.&lt;/li&gt;
&lt;/ul&gt;
    
    </summary>
    
      <category term="Engineering" scheme="http://fwz.github.io/categories/Engineering/"/>
    
      <category term="Dashboard" scheme="http://fwz.github.io/categories/Engineering/Dashboard/"/>
    
    
      <category term="Grafana" scheme="http://fwz.github.io/tags/Grafana/"/>
    
      <category term="Graphite" scheme="http://fwz.github.io/tags/Graphite/"/>
    
      <category term="Carbon" scheme="http://fwz.github.io/tags/Carbon/"/>
    
  </entry>
  
  <entry>
    <title>技术项目管理随想</title>
    <link href="http://fwz.github.io/2015/11/21/Thoughts-of-Project-Management/"/>
    <id>http://fwz.github.io/2015/11/21/Thoughts-of-Project-Management/</id>
    <published>2015-11-21T09:08:35.000Z</published>
    <updated>2019-05-02T17:28:58.926Z</updated>
    
    <content type="html"><![CDATA[<p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/project-management.jpg" alt="Project management"></p><p>最近看了一些技术项目排雷书，分几个方面总结了一下雷区，以及可以做得更好的地方。</p><h2 id="工程质量-amp-复杂度控制"><a href="#工程质量-amp-复杂度控制" class="headerlink" title="工程质量 &amp; 复杂度控制"></a>工程质量 &amp; 复杂度控制</h2><ul><li>预计变化的影响：需求会变化，系统扩展性需要加强，人员在流动。架构师的职责是管理变化，更确切的说，确保变化的影响是受控的。所以以下的技巧是必须的：<ul><li>只进行小型、增量的改动</li><li>写可重复运行的测试用例，经常执行</li><li>让编写和运行测试用例变得容易</li><li>追踪依赖事项的变化</li><li>自动化重复事项</li></ul></li><li>舍弃自作聪明的设计。这样的设计让系统变得脆弱和难以维护。越老实的系统越容易扩展。</li><li>关注性能。假如不从一开始就严格控制性能，「暂时不用关注性能」的想法就会在团队中蔓延。很快就变成「不得不马上解决线上的性能问题」。尽早在团队里面传授性能测试工具的用法是很有效的方法，真没那个时间的话，对依赖SQL的团队，普及一下explain的使用和结果解析都会有很大的提高。</li><li>要评估系统的扩展性，那就尝试向多个方向增长业务规模，看看哪个方面会先支撑不住，那个就是你下个要关注的方向。</li><li>重视界面设计。对于产品的用户来说，用户交互界面才是系统本身。系统的响应速度提高50%当然是好事，但废老大力气的优化很容易就被糟糕的交互带来的坏心情影响。<a id="more"></a></li><li>不要迷信设计模式，不要为了使用设计模式而使用。</li><li>所谓复用，不仅仅与架构有关，还和人有关。如果没有人知道还有个框架，没有人知道怎么复用，再厉害的框架也是白扯。所以文档是很重要的。</li><li>构建系统并不是一场选美竞赛，不要为了追求完美而主动寻找错误。</li><li>追踪每周有多少时间用于解决线上问题。</li><li>一朵起错名的玫瑰会长成卷心菜（我真喜欢外国人的比喻啊）。做某个事情之前，假如连这个事情应该怎么称呼都没想好，那就别开始做了，想清楚再做。</li><li>如果解决的问题稳定，那么很容易能得到一个可靠的解决方案。但无论开工之前想得多完善，最后的产出总不会是一模一样的。</li><li>开发过程中的“新想法”要要慎重考虑。因为一个技术很酷而使用，因为框架升级而要系统跟着升级，因为做A而要重构B，因为有一个关于架构的想法而进行讨论，都隐藏着危险的信号。这些事情会带来系统的变化，使项目变得逐渐不可控。</li><li>选择当下的最佳解决方案去解决问题。假如想太多关于未来的问题，可能既无法解决未来的问题，而且连当下的问题都不能好好解决。</li><li>量化你的数据。不断加入统计或埋点数据。</li><li>不走正路的Bugfix就像借贷，是要还利息的。利息主要表现为：系统变得更难以扩展。为了降低影响，合理的做法是，先上bugfix，并投入时间在下次发布之前做一个合理的bugfix。</li></ul><h2 id="团队建设"><a href="#团队建设" class="headerlink" title="团队建设"></a>团队建设</h2><ul><li>提高团队战斗力（确保他们已有对应的工具，CI，code formatter，性能测试工具；确保有对应的技能，每周五对需要用到的技能进行培训，在书本和讨论上投入；引入流程的时候，确保是用来解决问题，而非引入新问题）</li><li>寻找并留住有热情的工程师</li><li>自己写JD。</li><li>技术人员的顾客并不是真正的顾客（是产品人员），我们顾客的顾客才是真正的顾客。因此，需要考虑真正顾客的需求。例如产品为了赶工期要求暂不对数据进行加密，技术人员应该指出风险而不是默默接受，站在真正顾客的角度考虑问题，而不是提出任务的人。</li><li>工程师和机器打交道，架构师和人打交道，因此要学会Sell自己的观点。如何推销自己的观点：<ul><li>建立起Value proposition</li><li>用数据说话，尽早建立起监控进行反馈</li><li>找一个合适的时间提出方案（例如前一个框架被验证是失败的时候）</li></ul></li></ul><h2 id="沟通"><a href="#沟通" class="headerlink" title="沟通"></a>沟通</h2><ul><li>在讨价还价的过程中先索取得比需要的更多（以在协商中留有退路）</li><li>挑战并验证假设。是否用户真的不能忍？是否这个库就是比那个库糟糕？</li><li>做一个好合作、会说话、但并非好说话的工程师。</li><li>许下承诺和践行承诺才能得到尊重。这就要求我们考虑预算和时间的限制，尽我们所能让系统变得高效。</li><li>不断地问需求的提出方，需求能不能加上「在任何场合下，总会有……」的概括。由于需求方会比较谨慎地回答这种问题，因此他会不断地修饰和思考需求。反复地问，这样我们就会得到一个非常简洁、核心的「系统本质」。这个本质才是真正的我们需要关注的。</li></ul><h2 id="技术选型"><a href="#技术选型" class="headerlink" title="技术选型"></a>技术选型</h2><ul><li>影响技术选型的决策因素中，需求比经历重要。不要为了简历好看而用新技术。找那些容易和其他模块配合的框架。</li><li>决定技术选型之前，不妨真正去试一试。找两个工程师，花时间调研一下优点和缺点，总结对比，最后选哪个在很多情况下都是很显然的。这可能是在浪费时间吗？有可能。但总比匆匆找到一个非最优解马上开工最后发现不得不绕回来要好。</li><li>记录进行设计的依据。记录做了哪些决策，为什么做了这个决策，以及为什么不用其他选择。</li><li>为任何的技术决策负起责任。很多正确的设计最后却以失败告终。要避免失败，至少做到以下几点：技术决策应该传达给所有相关的人士。</li><li>选好趁手的工具，不要轻易切换。</li></ul><h2 id="工程师的自身成长"><a href="#工程师的自身成长" class="headerlink" title="工程师的自身成长"></a>工程师的自身成长</h2><ul><li>理解硬件。起码知道系统的状态和什么相关，各种硬件的性能级别，有线上报警的时候能根据报警项定位问题。</li><li>软件行业的一个大问题是工程师需要解决远超自己当前理解的问题。因此学习与沟通是关键的能力</li><li>架构师首先是个工程师。假如我做了设计，那起码我应该有能力亲自实现。</li></ul>]]></content>
    
    <summary type="html">
    
      &lt;p&gt;&lt;img src=&quot;https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/project-management.jpg&quot; alt=&quot;Project management&quot;&gt;&lt;/p&gt;
&lt;p&gt;最近看了一些技术项目排雷书，分几个方面总结了一下雷区，以及可以做得更好的地方。&lt;/p&gt;
&lt;h2 id=&quot;工程质量-amp-复杂度控制&quot;&gt;&lt;a href=&quot;#工程质量-amp-复杂度控制&quot; class=&quot;headerlink&quot; title=&quot;工程质量 &amp;amp; 复杂度控制&quot;&gt;&lt;/a&gt;工程质量 &amp;amp; 复杂度控制&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;预计变化的影响：需求会变化，系统扩展性需要加强，人员在流动。架构师的职责是管理变化，更确切的说，确保变化的影响是受控的。所以以下的技巧是必须的：&lt;ul&gt;
&lt;li&gt;只进行小型、增量的改动&lt;/li&gt;
&lt;li&gt;写可重复运行的测试用例，经常执行&lt;/li&gt;
&lt;li&gt;让编写和运行测试用例变得容易&lt;/li&gt;
&lt;li&gt;追踪依赖事项的变化&lt;/li&gt;
&lt;li&gt;自动化重复事项&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;舍弃自作聪明的设计。这样的设计让系统变得脆弱和难以维护。越老实的系统越容易扩展。&lt;/li&gt;
&lt;li&gt;关注性能。假如不从一开始就严格控制性能，「暂时不用关注性能」的想法就会在团队中蔓延。很快就变成「不得不马上解决线上的性能问题」。尽早在团队里面传授性能测试工具的用法是很有效的方法，真没那个时间的话，对依赖SQL的团队，普及一下explain的使用和结果解析都会有很大的提高。&lt;/li&gt;
&lt;li&gt;要评估系统的扩展性，那就尝试向多个方向增长业务规模，看看哪个方面会先支撑不住，那个就是你下个要关注的方向。&lt;/li&gt;
&lt;li&gt;重视界面设计。对于产品的用户来说，用户交互界面才是系统本身。系统的响应速度提高50%当然是好事，但废老大力气的优化很容易就被糟糕的交互带来的坏心情影响。
    
    </summary>
    
    
      <category term="Book Review, Thoughts" scheme="http://fwz.github.io/tags/Book-Review-Thoughts/"/>
    
  </entry>
  
  <entry>
    <title>Unit Test 101</title>
    <link href="http://fwz.github.io/2015/05/09/Unit-Test-101/"/>
    <id>http://fwz.github.io/2015/05/09/Unit-Test-101/</id>
    <published>2015-05-09T08:08:20.000Z</published>
    <updated>2019-05-02T17:20:41.788Z</updated>
    
    <content type="html"><![CDATA[<p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/testing.png" alt=""></p><p>The importance of writing unit tests have been heavily discussed. However, reality is cruel. Many developers do not write UT, and even larger amount of developers write funny tests just to pass the code coverage bar, and some developers with ambition are not quite sure about how to write good unit tests. Most good UT are written by top developers.</p><p>I thought I am that kind of ambitious developer, so I spend two weekends to start to learn it. Most cases listed below are originated from or revised from the book <a href="http://www.amazon.com/Practical-Unit-Testing-TestNG-Mockito/dp/839348930X" target="_blank" rel="noopener">“Practical Unit Testing with Testng and Mockito”</a> by <a href="http://kaczanowscy.pl/tomek/" target="_blank" rel="noopener">Tomek Kaczanowski</a>. A really good book worth reading.</p><hr><h2 id="Concepts"><a href="#Concepts" class="headerlink" title="Concepts"></a>Concepts</h2><p>Before we start, let’s visit the concepts of SUT and DOC.</p><a id="more"></a><h3 id="SUT-amp-DOC"><a href="#SUT-amp-DOC" class="headerlink" title="SUT &amp; DOC"></a>SUT &amp; DOC</h3><p>SUT, or <strong>System Under Test</strong>, are understood as <strong>the part of the system being tested</strong>. Depending on the type of test, SUT may be of very different granularity – from a single class to a whole application. </p><p>DOC, or <strong>Depended On Component</strong>, is any entity that is required by an SUT to fulfill its duties.</p><p>For example, recently I am working with <a href="https://spring.io/" target="_blank" rel="noopener">Spring</a>, and I am implementing a Filter class, which take in a <code>ServletRequest</code> object, a <code>ServletResponse</code> object and a <code>FilterChain</code> object. Then when writing unit test, SUT is the <code>Filter</code> class I am working on. DOC are consist of <code>ServletRequest</code> and a <code>ServletResponse</code>, and a <code>FilterChain</code>.</p><h3 id="Test-Type"><a href="#Test-Type" class="headerlink" title="Test Type"></a>Test Type</h3><p>There’s many types of tests(and names) in software development, but the most important tests are: </p><ul><li><strong>unit test</strong> which help to ensure high-quality code</li><li><strong>integration test</strong> which verify that different modules are cooperating effectively</li><li><strong>end to end test</strong> which put the system through its paces in ways that reflect the standpoint of users</li></ul><p>Simple. Let’s start to discuss UT (Unit Test).</p><hr><h2 id="Unit-Test"><a href="#Unit-Test" class="headerlink" title="Unit Test"></a>Unit Test</h2><h3 id="Structure"><a href="#Structure" class="headerlink" title="Structure"></a>Structure</h3><p>The structure of writing a UT is:</p><ul><li><strong>Arrange</strong> (Setup environment / foundation of the test)</li><li><strong>Act</strong> (run the test)</li><li><strong>Assert</strong> (compare expect result with actual result)</li></ul><h3 id="Frequency-of-running-UT"><a href="#Frequency-of-running-UT" class="headerlink" title="Frequency of running UT"></a>Frequency of running UT</h3><p>UT are supposed to be run repeatedly and frequently. Even someone is not using TDD, after we have implemented any feature, all the UT under this project should be run.</p><h3 id="Bad-smells"><a href="#Bad-smells" class="headerlink" title="Bad smells"></a>Bad smells</h3><p>A test is not a unit test if:</p><ul><li>It talks to the database</li><li>It communicates across the network</li><li>It touches the file system</li><li>It can’t run at the same time as any of your other unit tests</li><li>It might fail even the code is correct, or it might pass even the code is not correct.</li></ul><p>The above points make writing UT not quite straight forward. To make our UT clean, some knowledge such as Mocking is needed. Also we need to master are some skills to utilize the power of our test framework.</p><h3 id="Framework-and-tools"><a href="#Framework-and-tools" class="headerlink" title="Framework and tools"></a>Framework and tools</h3><p>Base on some investigation, I choose TestNG and Mockito and finally spot this book. This is just personal preferences. JUnit provide mostly the similar function with TestNG. Something we desperately need should be available for a mature framework.</p><hr><p>Let’s start coding to demonstrate what we have listed above. Here is a <code>Money</code> class, with two properties, amount and currency.</p><h3 id="Compatability"><a href="#Compatability" class="headerlink" title="Compatability"></a>Compatability</h3><figure class="highlight java"><figcaption><span>Money.java</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">Money</span> </span>&#123;</span><br><span class="line">    <span class="keyword">private</span> <span class="keyword">final</span> <span class="keyword">int</span> amount;</span><br><span class="line">    <span class="keyword">private</span> <span class="keyword">final</span> String currency;</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="title">Money</span><span class="params">(<span class="keyword">int</span> amount, String currency)</span> </span>&#123;</span><br><span class="line">        <span class="keyword">if</span> (amount &lt; <span class="number">0</span>) &#123;</span><br><span class="line">            <span class="keyword">throw</span> <span class="keyword">new</span> IllegalArgumentException(<span class="string">"illegal negative amount: ["</span> + amount + <span class="string">"]"</span>);</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">if</span> (currency == <span class="keyword">null</span> || currency.isEmpty()) &#123;</span><br><span class="line">            <span class="keyword">throw</span> <span class="keyword">new</span> IllegalArgumentException(<span class="string">"illegal currency: ["</span> + currency + <span class="string">"], it can not be null or empty"</span>);</span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line">        <span class="keyword">this</span>.amount = amount;</span><br><span class="line">        <span class="keyword">this</span>.currency = currency;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">int</span> <span class="title">getAmount</span><span class="params">()</span> </span>&#123;</span><br><span class="line">        <span class="keyword">return</span> amount;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">public</span> String <span class="title">getCurrency</span><span class="params">()</span> </span>&#123;</span><br><span class="line">        <span class="keyword">return</span> currency;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">boolean</span> <span class="title">equals</span><span class="params">(Object o)</span> </span>&#123;</span><br><span class="line">        <span class="keyword">if</span> (o <span class="keyword">instanceof</span> Money) &#123;</span><br><span class="line">            Money money = (Money) o;</span><br><span class="line">            <span class="keyword">return</span> money.getCurrency().equals(getCurrency()) &amp;&amp; getAmount() == money.getAmount();</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">return</span> <span class="keyword">false</span>;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>Now let’s write a very simple test to test whether the constructor work as expect. Here our test are with TestNG and JUnit style (with support in TestNG).</p><figure class="highlight java"><figcaption><span>MoneyTest.java</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> org.testng.annotations.Test;</span><br><span class="line"><span class="keyword">import</span> <span class="keyword">static</span> org.testng.Assert.assertEquals;</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> org.testng.AssertJUnit;</span><br><span class="line"><span class="meta">@Test</span></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">MoneyTest</span> </span>&#123;</span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">constructorShouldSetAmountAndCurrency</span><span class="params">()</span> </span>&#123;</span><br><span class="line">        Money money = <span class="keyword">new</span> Money(<span class="number">10</span>, <span class="string">"USD"</span>);   <span class="comment">// arrange</span></span><br><span class="line"></span><br><span class="line">        <span class="comment">// a TestNG Way</span></span><br><span class="line">        assertEquals(money.getAmount(), <span class="number">10</span>);  <span class="comment">// act and assert</span></span><br><span class="line">        assertEquals(money.getCurrency(), <span class="string">"USD"</span>);</span><br><span class="line"></span><br><span class="line">        <span class="comment">// a JUnit Way</span></span><br><span class="line">        AssertJUnit.assertEquals(<span class="string">"USD"</span>, money.getCurrency());</span><br><span class="line">        AssertJUnit.assertEquals(<span class="number">10</span>, money.getAmount());</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><hr><h2 id="Parameter-Tests"><a href="#Parameter-Tests" class="headerlink" title="Parameter Tests"></a>Parameter Tests</h2><p>Now the first important trick. For many cases, one set of input/output are not sufficient when writing UT.<br><figure class="highlight java"><figcaption><span>NotSoGoodTest.java</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">void</span> Test&#123;</span><br><span class="line"></span><br><span class="line">    TestObject to1 = <span class="keyword">new</span> TestObject();</span><br><span class="line">    InputParam input1 = <span class="keyword">new</span> InputParam();</span><br><span class="line">    OutputParam output1 = <span class="keyword">new</span> Output();</span><br><span class="line"></span><br><span class="line">    assertEquals(to1.calc(input1), output1);</span><br><span class="line"></span><br><span class="line">    TestObject to2 = <span class="keyword">new</span> TestObject();</span><br><span class="line">    InputParam input2 = <span class="keyword">new</span> InputParam();</span><br><span class="line">    OutputParam output2 = <span class="keyword">new</span> Output();</span><br><span class="line"></span><br><span class="line">    assertEquals(to2.calc(input2), output2);</span><br><span class="line"></span><br><span class="line">&#125;</span><br><span class="line"></span><br></pre></td></tr></table></figure><br>If we don’t want to write code like this, then <em>Parameter Test</em> is our friend. Use a <a href="http://testng.org/javadoc/org/testng/annotations/DataProvider.html" target="_blank" rel="noopener">DataProvider</a> annotation to mark a method as <strong>supplying data</strong> for a test method. It return a 2D array, contains a list of Object[], each Object will be passed as input to the test function.</p><figure class="highlight java"><figcaption><span>MoneyTest.java</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> org.testng.annotations.DataProvider;</span><br><span class="line"><span class="keyword">import</span> org.testng.annotations.Test;</span><br><span class="line"><span class="keyword">import</span> <span class="keyword">static</span> org.testng.Assert.assertEquals;</span><br><span class="line"></span><br><span class="line"><span class="meta">@Test</span></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">MoneyTest</span> </span>&#123;</span><br><span class="line"></span><br><span class="line">    <span class="comment">// Parameter tests</span></span><br><span class="line">    <span class="meta">@DataProvider</span></span><br><span class="line">     <span class="keyword">private</span> <span class="keyword">static</span> <span class="keyword">final</span> Object[][] getMoney()&#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="keyword">new</span> Object[][] &#123;</span><br><span class="line">            &#123;<span class="number">10</span>, <span class="string">"USD"</span>&#125;,</span><br><span class="line">            &#123;<span class="number">20</span>, <span class="string">"EUR"</span>&#125;,</span><br><span class="line">            &#123;<span class="number">30</span>, <span class="string">"CNY"</span>&#125;</span><br><span class="line">        &#125;;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="meta">@Test</span>(dataProvider = <span class="string">"getMoney"</span>)</span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">constructorShouldSetAmountAndCurrency</span><span class="params">(</span></span></span><br><span class="line"><span class="function"><span class="params">            <span class="keyword">int</span> amount, String currency)</span> </span>&#123;</span><br><span class="line">        Money money = <span class="keyword">new</span> Money(amount, currency);</span><br><span class="line">        assertEquals(money.getAmount(), amount);</span><br><span class="line">        assertEquals(money.getCurrency(), currency);</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>See? A DataProvider notation help you define a list of objects. When linked with the dataprovider function, it will automatically pass the object to the test function as parameters. Previous code generate 3 tests.</p><p>There are some advance usage of Parameter Tests:</p><p>DataProvider could live in another Class so they could be reused (But nee modification to test function, use <code>@Test(dataProvider = &quot;getMoney&quot;, dataProviderClass = DataProviders.class)</code> annotation).</p><p>DataProvider could do lazy initialization, so when you are going to generate many test cases (say 10000+), they don’t need to be initialized before the test, incase any cases fail in the middle and it waste a lot of resources. (need further implement an iterator based on DataProvider class)</p><figure class="highlight plain"><figcaption><span>dataProviderIterator.java</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">@DataProvider(name=&quot;colors&quot;) </span><br><span class="line">public Iterator&lt;Object[]&gt; getColors()&#123;</span><br><span class="line">  Set&lt;Object[]&gt; result=new HashSet&lt;Object[]&gt;();</span><br><span class="line">  result.add(new Object[]&#123;&quot;black&quot;&#125;);</span><br><span class="line">  result.add(new Object[]&#123;&quot;silver&quot;&#125;);</span><br><span class="line">  result.add(new Object[]&#123;&quot;gray&quot;&#125;);</span><br><span class="line">  return result.iterator();</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><hr><h2 id="Testing-Exceptions"><a href="#Testing-Exceptions" class="headerlink" title="Testing Exceptions"></a>Testing Exceptions</h2><p>Here is a simple case about how to write UT to test the exception. Some developer might actually include 4 stuffs in a single test (just to improve coverage): </p><ul><li>start the SUT</li><li>pass an invalid value</li><li>catch the exceptions</li><li>handle it </li></ul><p>However, TestNG provide an much cleaner way to test it. <code>expectedExceptions</code> notation are to help.</p><p>(Why bother to handle it? Because if we don’t handle the exception, then this test will fail…) </p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line">public class MoneyIAETest &#123;</span><br><span class="line">    private final static int VALID_AMOUNT = 5;</span><br><span class="line">    private final static String VALID_CURRENCY = &quot;USD&quot;;</span><br><span class="line"></span><br><span class="line">    @DataProvider</span><br><span class="line">    private static final Object[][] getInvalidAmount()&#123;</span><br><span class="line">        return new Integer[][] &#123; &#123;-12387&#125;, &#123;-5&#125;, &#123;-1&#125; &#125;;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    @Test(dataProvider = &quot;getInvalidAmount&quot;,</span><br><span class="line">        expectedExceptions = IllegalArgumentException.class)</span><br><span class="line">    public void shouldThrowIAEForInvalidAmount(int invalidAmount) &#123;</span><br><span class="line">        Money money = new Money(invalidAmount, VALID_CURRENCY);</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    @DataProvider</span><br><span class="line">    private static final Object[][] getInvalidCurrency()&#123;</span><br><span class="line">        return new String[][] &#123; &#123;null&#125;, &#123;&quot;&quot;&#125; &#125;;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    @Test(dataProvider = &quot;getInvalidCurrency&quot;,</span><br><span class="line">        expectedExceptions = IllegalArgumentException.class)</span><br><span class="line">    public void shouldThrowIAEForInvalidCurrency(String invalidCurrency) &#123;</span><br><span class="line">        Money money = new Money(VALID_AMOUNT, invalidCurrency);</span><br><span class="line">&#125; &#125;</span><br><span class="line"></span><br></pre></td></tr></table></figure><hr><h3 id="Testing-Concurrent-Codes"><a href="#Testing-Concurrent-Codes" class="headerlink" title="Testing Concurrent Codes"></a>Testing Concurrent Codes</h3><p>Testing concurrency, to some extent is nightmare. If not correctly implemented, the quality of test might become a problem of a UT. Let’s introduce two more attributes which is helpful to test concurrent codes.</p><ul><li>threadPoolSize, which sets the number of threads that are to execute a test method </li><li>invocationCount, which sets the total number of test method executions.</li></ul><p>There is no need for us to implement threads ourselves.</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line">public class SystemIdGenerator implements IdGenerator &#123;</span><br><span class="line">    private static Long nextId = System.currentTimeMillis();</span><br><span class="line"></span><br><span class="line">    // is it thread safe?</span><br><span class="line">    public Long nextId() &#123;</span><br><span class="line">        return nextId++;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><figure class="highlight java"><figcaption><span>SystemIdGeneratorTest.java</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">SystemIdGeneratorTest</span> </span>&#123;</span><br><span class="line">    <span class="keyword">private</span> IdGenerator idGen = <span class="keyword">new</span> SystemIdGenerator();</span><br><span class="line"></span><br><span class="line">    <span class="meta">@Test</span></span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">idsShouldBeUnique</span><span class="params">()</span> </span>&#123;</span><br><span class="line">        Long idA = idGen.nextId();</span><br><span class="line">        Long idB = idGen.nextId();</span><br><span class="line">        assertNotEquals(idA, idB, <span class="string">"idA "</span> + idA + <span class="string">" idB "</span> + idB);</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="meta">@Test</span></span><br><span class="line">    <span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">JVMUniqueIdGeneratorParallelTest</span> </span>&#123;</span><br><span class="line">        <span class="keyword">private</span> IdGenerator idGen = <span class="keyword">new</span> SystemIdGenerator();</span><br><span class="line">        <span class="keyword">private</span> Set&lt;Long&gt; ids = <span class="keyword">new</span> HashSet&lt;Long&gt;(<span class="number">10000</span>);</span><br><span class="line"></span><br><span class="line">        <span class="meta">@Test</span>(threadPoolSize = <span class="number">997</span>, invocationCount = <span class="number">10000</span>)</span><br><span class="line">        <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">idsShouldBeUnique</span><span class="params">()</span> </span>&#123;</span><br><span class="line">            assertTrue(ids.add(idGen.nextId()));</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>I get 307 fail cases in my laptop for the above code. It failed because the unary operator is not atomic and there might be two thread reading the same value of id.</p><p>In addition to what we have done, you can also use the timeOut and invocationTimeOut attributes of @Test annotation. Their role is to break the test execution and fail the test if it takes too long (e.g. if your code has caused a deadlock or entered some infinite loop). </p><h3 id="Collection-Testing"><a href="#Collection-Testing" class="headerlink" title="Collection Testing"></a>Collection Testing</h3><ul><li>Unitils</li><li>Hamcrest</li><li>FEST Fluent Assertion</li></ul><h4 id="Unitils"><a href="#Unitils" class="headerlink" title="Unitils"></a>Unitils</h4><p>Unitils could help customize the equal definition</p><figure class="highlight java"><figcaption><span>SetEqualityTest.java</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> org.testng.annotations.BeforeMethod;</span><br><span class="line"><span class="keyword">import</span> org.testng.annotations.Test;</span><br><span class="line"><span class="keyword">import</span> org.unitils.reflectionassert.ReflectionComparatorMode;</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> java.util.LinkedHashSet;</span><br><span class="line"><span class="keyword">import</span> java.util.Set;</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> <span class="keyword">static</span> org.unitils.reflectionassert.ReflectionAssert.assertReflectionEquals;</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">SetEqualityTest</span> </span>&#123;</span><br><span class="line">    <span class="comment">// same setA and setB created as in the previous TestNG example</span></span><br><span class="line"></span><br><span class="line">    Set&lt;Integer&gt; setA;</span><br><span class="line">    Set&lt;Integer&gt; setB;</span><br><span class="line"></span><br><span class="line">    <span class="meta">@BeforeMethod</span></span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">setUp</span><span class="params">()</span> </span>&#123;</span><br><span class="line">        setA = <span class="keyword">new</span> LinkedHashSet&lt;Integer&gt;();</span><br><span class="line">        setB = <span class="keyword">new</span> LinkedHashSet&lt;Integer&gt;();</span><br><span class="line"></span><br><span class="line">        setA.add(<span class="number">1</span>);</span><br><span class="line">        setA.add(<span class="number">2</span>);</span><br><span class="line"></span><br><span class="line">        setB.add(<span class="number">2</span>);</span><br><span class="line">        setB.add(<span class="number">1</span>);</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="meta">@Test</span></span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">twoSetsAreEqualsIfTheyHaveSameContentAndSameOrder</span><span class="params">()</span> </span>&#123;</span><br><span class="line">        <span class="comment">// assertReflectionEquals(setA, setB);</span></span><br><span class="line"></span><br><span class="line">        <span class="comment">// This will failed with following error</span></span><br><span class="line">        <span class="comment">// --- Found following differences ---</span></span><br><span class="line">        <span class="comment">/*</span></span><br><span class="line"><span class="comment">        [0]: expected: 1, actual: 2</span></span><br><span class="line"><span class="comment">        [1]: expected: 2, actual: 1</span></span><br><span class="line"><span class="comment"></span></span><br><span class="line"><span class="comment">        --- Difference detail tree ---</span></span><br><span class="line"><span class="comment">         expected: [1, 2]</span></span><br><span class="line"><span class="comment">           actual: [2, 1]</span></span><br><span class="line"><span class="comment">        */</span></span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="meta">@Test</span></span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">twoSetsAreEqualsIfTheyHaveSameContentAndAnyOrder</span><span class="params">()</span> </span>&#123;</span><br><span class="line">        assertReflectionEquals(setA, setB,</span><br><span class="line">                ReflectionComparatorMode.LENIENT_ORDER);</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="Fest-Fluent-Assertion"><a href="#Fest-Fluent-Assertion" class="headerlink" title="Fest Fluent Assertion"></a>Fest Fluent Assertion</h4><p>FEST Fluent Assertions, which is a part of FEST library, offers many assertions which can simplify collections testing. It also provides a fluent interface, which allows for the chaining together of assertions.</p><figure class="highlight java"><figcaption><span>FestCollectionTest.java</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> org.fest.assertions.api.MapAssert;</span><br><span class="line"><span class="keyword">import</span> org.testng.annotations.BeforeMethod;</span><br><span class="line"><span class="keyword">import</span> org.testng.annotations.Test;</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> java.util.HashMap;</span><br><span class="line"><span class="keyword">import</span> java.util.LinkedHashMap;</span><br><span class="line"><span class="keyword">import</span> java.util.LinkedHashSet;</span><br><span class="line"><span class="keyword">import</span> java.util.Set;</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> <span class="keyword">static</span> org.fest.assertions.api.Assertions.assertThat;</span><br><span class="line"><span class="keyword">import</span> <span class="keyword">static</span> org.fest.assertions.api.Assertions.entry;</span><br><span class="line"></span><br><span class="line"><span class="comment">/**</span></span><br><span class="line"><span class="comment"> * Created by wenzhong on 5/9/15.</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">FestCollectionTest</span> </span>&#123;</span><br><span class="line">    <span class="comment">// same setA and setB created as in the previous TestNG example</span></span><br><span class="line"></span><br><span class="line">    Set&lt;Integer&gt; setA;</span><br><span class="line">    Set&lt;Integer&gt; setB;</span><br><span class="line"></span><br><span class="line">    <span class="meta">@BeforeMethod</span></span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">setUp</span><span class="params">()</span> </span>&#123;</span><br><span class="line">        setA = <span class="keyword">new</span> LinkedHashSet&lt;Integer&gt;();</span><br><span class="line">        setB = <span class="keyword">new</span> LinkedHashSet&lt;Integer&gt;();</span><br><span class="line"></span><br><span class="line">        setA.add(<span class="number">1</span>);</span><br><span class="line">        setA.add(<span class="number">2</span>);</span><br><span class="line"></span><br><span class="line">        setB.add(<span class="number">2</span>);</span><br><span class="line">        setB.add(<span class="number">1</span>);</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="meta">@Test</span></span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">collectionsUtilityMethods</span><span class="params">()</span> </span>&#123;</span><br><span class="line">        assertThat(setA)</span><br><span class="line">            .isNotEmpty()</span><br><span class="line">            .hasSize(<span class="number">2</span>)</span><br><span class="line">            .doesNotHaveDuplicates();</span><br><span class="line"></span><br><span class="line">        assertThat(setA).containsOnly(<span class="number">1</span>, <span class="number">2</span>);</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="meta">@Test</span></span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">mapUtilityMethods</span><span class="params">()</span> </span>&#123;</span><br><span class="line">        HashMap&lt;String, Integer&gt; map = <span class="keyword">new</span> LinkedHashMap&lt;String, Integer&gt;();</span><br><span class="line">        map.put(<span class="string">"a"</span>, <span class="number">2</span>);</span><br><span class="line">        map.put(<span class="string">"b"</span>, <span class="number">3</span>);</span><br><span class="line"></span><br><span class="line">        assertThat(map)</span><br><span class="line">            .isNotNull()</span><br><span class="line">            .isNotEmpty()</span><br><span class="line">            .contains(entry(<span class="string">"a"</span>, <span class="number">2</span>), entry(<span class="string">"b"</span>, <span class="number">3</span>))</span><br><span class="line">            .doesNotContainKey(<span class="string">"c"</span>);</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>FEST provides a fluent interface, which allows for the chaining together of assertions. In this case verification of emptiness, size and the absence of duplicates all gets fitted into a single line of code, but the code is still readable.</p><hr><h2 id="Dependency-between-tests"><a href="#Dependency-between-tests" class="headerlink" title="Dependency between tests"></a>Dependency between tests</h2><p>Now think about a test which want to make sure that user account management is correct. It have 2 functions: adding and deleting user accounts. In traditional UT scope, test are independent so the test code might looks like <code>testAddition</code> and <code>testDeletion</code>. Note that the addition actually run for 2 times, which actually break the “DRY” rules.</p><p>However, testNG use another attribute in @Test notation to solve this problem. This time, an explicit dependency is established between the tests. </p><figure class="highlight java"><figcaption><span>TestWithDependencyTest.java</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> org.testng.annotations.Test;</span><br><span class="line"></span><br><span class="line"><span class="comment">/**</span></span><br><span class="line"><span class="comment"> * Created by wenzhong on 5/9/15.</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">TestWithDependencyTest</span> </span>&#123;</span><br><span class="line"></span><br><span class="line">    <span class="meta">@Test</span></span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">testAddition</span><span class="params">()</span> </span>&#123;</span><br><span class="line">        <span class="comment">// adds user X to the system</span></span><br><span class="line">        <span class="comment">// verifies it exists, by issuing SQL query against the database</span></span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">    <span class="meta">@Test</span></span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">testDeletion</span><span class="params">()</span> </span>&#123;</span><br><span class="line">        <span class="comment">// adds user Y to the system</span></span><br><span class="line">        <span class="comment">// verify that it exists (so later we know that it was actually removed)</span></span><br><span class="line">        <span class="comment">// removes user Y</span></span><br><span class="line">        <span class="comment">// makes sure it does not exist, by issuing SQL query against the database</span></span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="meta">@Test</span>(dependsOnMethods = <span class="string">"testAddition"</span>)</span><br><span class="line">    <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">testDeletion2</span><span class="params">()</span> </span>&#123;</span><br><span class="line">        <span class="comment">// removes user X</span></span><br><span class="line">        <span class="comment">// makes sure it does not exist, by issuing SQL query against the database</span></span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>However, it’s not always a good thing to have dependency. Think of a test suite with dozens of test, adding a new test will be a nightmare because you might need to find a correct spot to place it into the dependency tree. An alternative is try to initialize the state before each test. And use the dependency in integration test and E2E test.</p><h3 id="Private-Method-Testing"><a href="#Private-Method-Testing" class="headerlink" title="Private Method Testing"></a>Private Method Testing</h3><p>Should we test private method? Of course. But from the best practice we should test it via public method test. However, when facing legacy code, we might face a dilemma: we need test to make sure it work as expect, but writing test need refactor on code, without test we don’t know whether our refactor is correct.</p><p>So when facing legacy code, we compromise on private method testing.</p><h4 id="Built-in-support-in-Java"><a href="#Built-in-support-in-Java" class="headerlink" title="Built-in support in Java"></a>Built-in support in Java</h4><p>Let’s examine some native support from Java.</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line">public class ClassWithPrivateMethod &#123;</span><br><span class="line">    private boolean privateMethod(Long param) &#123;</span><br><span class="line">        return true;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">import org.testng.annotations.Test;</span><br><span class="line"></span><br><span class="line">import java.lang.reflect.InvocationTargetException;</span><br><span class="line">import java.lang.reflect.Method;</span><br><span class="line"></span><br><span class="line">import static org.testng.Assert.*;</span><br><span class="line"></span><br><span class="line">public class ClassWithPrivateMethodTest &#123;</span><br><span class="line">    @Test</span><br><span class="line">    public void testingPrivateMethodWithReflection()</span><br><span class="line">            throws NoSuchMethodException, InvocationTargetException,</span><br><span class="line">                   IllegalAccessException &#123;</span><br><span class="line"></span><br><span class="line">        // Note: this is an ugly implementation</span><br><span class="line"></span><br><span class="line">        ClassWithPrivateMethod sut = new ClassWithPrivateMethod();</span><br><span class="line"></span><br><span class="line">        Class[] parameterTypes = new Class[1];</span><br><span class="line">        parameterTypes[0] = java.lang.Long.class;</span><br><span class="line"></span><br><span class="line">        Method m = sut.getClass()</span><br><span class="line">            .getDeclaredMethod(&quot;privateMethod&quot;, parameterTypes);</span><br><span class="line"></span><br><span class="line">        // make it accesible outside of class</span><br><span class="line">        m.setAccessible(true);</span><br><span class="line"></span><br><span class="line">        Object[] parameters = new Object[1];</span><br><span class="line">        parameters[0] = 5569L;</span><br><span class="line"></span><br><span class="line">        // actually invoke</span><br><span class="line">        Boolean result = (Boolean) m.invoke(sut, parameters);</span><br><span class="line">        assertTrue(result);</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br></pre></td></tr></table></figure><h4 id="Using-PowerMock"><a href="#Using-PowerMock" class="headerlink" title="Using PowerMock"></a>Using PowerMock</h4><p><a href="https://code.google.com/p/powermock/wiki/MockPrivate" target="_blank" rel="noopener">PowerMock</a> give us a cleaner approach to work on testing private methods. The <a href="http://powermock.googlecode.com/svn/docs/powermock-1.3.7/apidocs/org/powermock/reflect/Whitebox.html" target="_blank" rel="noopener">WhiteBox</a> class provide various utilities for accessing internals of a class. Basically it’s a simplified reflection utility intended for tests.<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line">import org.powermock.reflect.Whitebox;</span><br><span class="line"></span><br><span class="line">public class ClassWithPrivateMethodTest &#123;</span><br><span class="line"></span><br><span class="line">    @Test</span><br><span class="line">    public void testingPrivateMethodWithReflectionByPowerMock()</span><br><span class="line">            throws Exception, IllegalAccessException &#123;</span><br><span class="line">        ClassWithPrivateMethod sut = new ClassWithPrivateMethod();</span><br><span class="line">        </span><br><span class="line">        Boolean result = Whitebox</span><br><span class="line">            .invokeMethod(sut, &quot;privateMethod&quot;, 302483L);</span><br><span class="line">        assertTrue(result);</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></p><h3 id="Testing-non-dependency-injection-code"><a href="#Testing-non-dependency-injection-code" class="headerlink" title="Testing non dependency injection code"></a>Testing non dependency injection code</h3><p>Think about the following production code (which is less test-able), how can we mock the <code>MyCollaborator</code> class?</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">public class MySut &#123;</span><br><span class="line">    public void myMethod() &#123;</span><br><span class="line">        MyCollaborator collaborator = new MyCollaborator();</span><br><span class="line">        // some behaviour worth testing here which uses collaborator</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>PS. the correct way with dependency injection should be<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">public class MySut &#123;</span><br><span class="line">    public void myMethod(MyCollaborator collaborator) &#123;</span><br><span class="line">        // some behaviour worth testing here which uses collaborator</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br></pre></td></tr></table></figure></p><h4 id="PowerMock-to-Rescue"><a href="#PowerMock-to-Rescue" class="headerlink" title="PowerMock to Rescue"></a>PowerMock to Rescue</h4><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line">import org.powermock.api.mockito.PowerMockito;</span><br><span class="line">import org.powermock.core.classloader.annotations.PrepareForTest;</span><br><span class="line">import org.testng.IObjectFactory;</span><br><span class="line">import org.testng.annotations.ObjectFactory;</span><br><span class="line">import org.testng.annotations.Test;</span><br><span class="line"></span><br><span class="line">import static org.powermock.api.mockito.PowerMockito.mock;</span><br><span class="line"></span><br><span class="line">@PrepareForTest(NotDOCInjectedSUT.class)</span><br><span class="line">public class NonInjectedDOCTest&#123;</span><br><span class="line">    @ObjectFactory</span><br><span class="line">    public IObjectFactory getObjectFactory() &#123;</span><br><span class="line">        return new org.powermock.modules.testng.PowerMockObjectFactory();</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    @Test</span><br><span class="line">    public void testMyMethod() throws Exception &#123;</span><br><span class="line">        NotDOCInjectedSUT sut = new NotDOCInjectedSUT();</span><br><span class="line">        MyCollaborator collaborator = mock(MyCollaborator.class);</span><br><span class="line"></span><br><span class="line">        // the whenNew function applies to</span><br><span class="line">        // normal test using Mockito&apos;s syntax</span><br><span class="line">        // e.g. Mockito.when(collaborator.someMethod()).thenReturn(...)</span><br><span class="line">        PowerMockito.whenNew(MyCollaborator.class)</span><br><span class="line">                .withNoArguments().thenReturn(collaborator);</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br></pre></td></tr></table></figure><p>Here some new attribute and method are introduced:</p><ul><li>the <code>@PrepareForTest</code> annotation informs PowerMock that the <code>NotDOCInjectedSUT</code> class will create a new instance of some other class. In general, this is how PowerMock learns, about which classes it should perform some bytecode manipulation.</li><li>In order to use PowerMock with TestNG, we need to make PowerMock responsible for the creation of all of the test instances. So we use the <code>@ObjectFactory</code> notation to assign PowerMockObjectFactory as the Factory class to generate test instances.</li><li>The test double is created as usual - with the static mock() method of Mockito.</li><li><code>whenNew()</code> is the place magic happens: whenever a new object of the MyCollaborator class gets created, our test double object (collaborator) will be used instead. Two of PowerMock’s methods - whenNew() and withNoArguments() - are used to control the execution of a no-arguments constructor of the MyCollaborator class.</li><li>Note that <code>static</code> methods could also be mocked as the <code>new</code> operator.</li></ul><h4 id="ArgumentCaptor"><a href="#ArgumentCaptor" class="headerlink" title="ArgumentCaptor"></a>ArgumentCaptor</h4><p>But if the <code>collaborator</code> class have some input parameters for its constructor?</p><p>Here we have a not quite good class – <code>PIM</code> (Personal Information Manager?), and related class <code>Calendar</code> and <code>Meeting</code>.</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line">public interface Calendar &#123;</span><br><span class="line">    public void addEvent(Event event);</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">public class Meeting implements Event &#123;</span><br><span class="line">    private final Date startDate;</span><br><span class="line">    private final Date endDate;</span><br><span class="line">    public Meeting(Date startDate, Date endDate) &#123;</span><br><span class="line">        this.startDate = new Date(startDate.getTime());</span><br><span class="line">        this.endDate = new Date(endDate.getTime());</span><br><span class="line">    &#125;</span><br><span class="line">    public Date getStartDate() &#123;</span><br><span class="line">        return startDate;</span><br><span class="line">    &#125;</span><br><span class="line">    public Date getEndDate() &#123;</span><br><span class="line">        return endDate;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">public class PIM &#123;</span><br><span class="line">    private final static int MILLIS_IN_MINUTE = 60 * 1000;</span><br><span class="line">    private Calendar calendar;</span><br><span class="line">    public PIM(Calendar calendar) &#123;</span><br><span class="line">        this.calendar = calendar;</span><br><span class="line">    &#125;</span><br><span class="line">    public void addMeeting(Date startDate, int durationInMinutes) &#123;</span><br><span class="line">        Date endDate = new Date(startDate.getTime() + MILLIS_IN_MINUTE * durationInMinutes);</span><br><span class="line"></span><br><span class="line">        Meeting meeting = new Meeting(startDate, endDate);</span><br><span class="line">        calendar.addEvent(meeting);</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br></pre></td></tr></table></figure><p>As we could see that the meeting inside the <code>addMeeting</code> method is hard to mock. However, Mockito provide a <code>ArgumentCaptor</code> function, which we could use to get information from the type <code>Meeting</code>.</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br></pre></td><td class="code"><pre><span class="line">import org.mockito.ArgumentCaptor;</span><br><span class="line">import org.testng.annotations.Test;</span><br><span class="line"></span><br><span class="line">import java.util.Date;</span><br><span class="line"></span><br><span class="line">import static org.mockito.Mockito.mock;</span><br><span class="line">import static org.mockito.Mockito.verify;</span><br><span class="line">import static org.testng.Assert.*;</span><br><span class="line"></span><br><span class="line">/**</span><br><span class="line"> * Created by wenzhong on 5/10/15.</span><br><span class="line"> */</span><br><span class="line"></span><br><span class="line">public class PIMTest &#123;</span><br><span class="line">    private static final int ONE_HOUR = 60;</span><br><span class="line">    private static final Date START_DATE = new Date();</span><br><span class="line">    private static final int MILLIS_IN_MINUTE = 1000 * 60;</span><br><span class="line">    private static final Date END_DATE</span><br><span class="line">            = new Date(START_DATE.getTime() + ONE_HOUR * MILLIS_IN_MINUTE);</span><br><span class="line"></span><br><span class="line">    @Test</span><br><span class="line">    public void shouldAddNewEventToCalendar() &#123;</span><br><span class="line">        Calendar calendar = mock(Calendar.class);</span><br><span class="line">        PIM pim = new PIM(calendar);</span><br><span class="line"></span><br><span class="line">        // An object of the ArgumentCaptor class is created, </span><br><span class="line">        // which will gather information on arguments of the type Meeting.</span><br><span class="line">        ArgumentCaptor&lt;Meeting&gt; argument</span><br><span class="line">                = ArgumentCaptor.forClass(Meeting.class);</span><br><span class="line"></span><br><span class="line">        pim.addMeeting(START_DATE, ONE_HOUR);</span><br><span class="line"></span><br><span class="line">        // The addEvent() method’s having been called is verified,</span><br><span class="line">        // and Mockito is instructed to capture arguments of this method call.</span><br><span class="line">        verify(calendar).addEvent(argument.capture());</span><br><span class="line"></span><br><span class="line">        Meeting meeting = argument.getValue();</span><br><span class="line">        assertEquals(meeting.getStartDate(), START_DATE);</span><br><span class="line">        assertEquals(meeting.getEndDate(), END_DATE);</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">In the above code, the actual argument to the `addEvent()` method is extracted from the `ArgumentCaptor` object.</span><br><span class="line"></span><br></pre></td></tr></table></figure><hr><h1 id="Writing-Testable-Code"><a href="#Writing-Testable-Code" class="headerlink" title="Writing Testable Code"></a>Writing Testable Code</h1><p>Code that wasn’t designed to be testable is not testable.</p><h3 id="Rely-on-Dependency-Injection"><a href="#Rely-on-Dependency-Injection" class="headerlink" title="Rely on Dependency Injection"></a>Rely on Dependency Injection</h3><p>So every dependency could be mocked.</p><h3 id="Never-hide-a-TUF-within-TUC"><a href="#Never-hide-a-TUF-within-TUC" class="headerlink" title="Never hide a TUF within TUC"></a>Never hide a TUF within TUC</h3><p><em>TUF</em>, or <em>Test UnFriendly Feature</em>, include following examples:</p><ul><li>Database access</li><li>FileSystem access</li><li>Network access</li><li>Affect of side effect access</li><li>lengthy / inscrutable computations</li><li>static variable usage </li></ul><p><em>TUC</em>, or <em>Test Unfriendly Constructor</em>, include following examples:</p><ul><li>Final methods</li><li>Final classes</li><li>Static methods</li><li>Private methods</li><li>Static InitializationExpressions</li><li>Constructors</li><li>ObjectInitialization Blocks</li><li>New-Expressions</li></ul><h3 id="Handling-existing-issues"><a href="#Handling-existing-issues" class="headerlink" title="Handling existing issues"></a>Handling existing issues</h3><ol><li>subclass and override</li></ol><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">public class EventResponder</span><br><span class="line">&#123;</span><br><span class="line">    void showNotification(String notificationMessage) &#123;</span><br><span class="line">        JOptionPane.showMessageDialog(null, notificationMessage);</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    public void respond() &#123;</span><br><span class="line">        ...</span><br><span class="line">        showNotification(message);</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><figure class="highlight plain"><figcaption><span>testRespond</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">public class TestingEventResponder extends EventResponder</span><br><span class="line">&#123;</span><br><span class="line">    void showNotification(String notificationMessage) &#123;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>Let’s think about when we could not use this “subclass and override” way to test?</p><ul><li>Final method (we could not override)</li><li>Final class (so we could not inherit)</li><li>Static method (we could not override)</li><li>Private method (we could not override)</li><li>Constructor (they have to run and we have no way to override it)</li></ul><p>Strictly speaking, final classes and final methods are only a problem in testing when they hide a TUF, but it’s nice to have a carefully considered reason for using them rather than just using them by default.</p><h1 id="References-amp-Further-Reading"><a href="#References-amp-Further-Reading" class="headerlink" title="References &amp; Further Reading"></a>References &amp; Further Reading</h1><ul><li><a href="http://martinfowler.com/bliki/FluentInterface.html" target="_blank" rel="noopener">http://martinfowler.com/bliki/FluentInterface.html</a></li><li><a href="http://www.objectmentor.com/resources/articles/TestableJava.pdf" target="_blank" rel="noopener">Testable Java</a> by Michael Feathers</li><li><a href="http://testng.org/doc/book.html" target="_blank" rel="noopener">Next Generation Java Testing: TestNG and Advanced Concepts</a> By Cédric Beust, Hani Suleiman</li><li><a href="http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052" target="_blank" rel="noopener">Working Effectively With Legacy Code</a> by Michael Feathers</li><li><a href="http://www.amazon.com/Refactoring-Improving-Design-Existing-Code/dp/0201485672/" target="_blank" rel="noopener">Refactoring: Improving the Design of Existing Code</a> by Martin Fowler, Kent Beck, John Brant, William Opdyke, Don Roberts</li><li><a href="http://www.amazon.com/Growing-Object-Oriented-Software-Guided-Tests/dp/0321503627/" target="_blank" rel="noopener">Growing Object-Oriented Software, Guided by Tests</a> by Steve Freeman , Nat Pryce </li><li><a href="http://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601/" target="_blank" rel="noopener">Java Concurrency in Practice</a></li><li><a href="http://www.amazon.com/gp/product/0132350882martin2008" target="_blank" rel="noopener">Clean Code: A Handbook of Agile Software Craftsmanship</a> By Robert C. Martin</li></ul>]]></content>
    
    <summary type="html">
    
      &lt;p&gt;&lt;img src=&quot;https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/testing.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;The importance of writing unit tests have been heavily discussed. However, reality is cruel. Many developers do not write UT, and even larger amount of developers write funny tests just to pass the code coverage bar, and some developers with ambition are not quite sure about how to write good unit tests. Most good UT are written by top developers.&lt;/p&gt;
&lt;p&gt;I thought I am that kind of ambitious developer, so I spend two weekends to start to learn it. Most cases listed below are originated from or revised from the book &lt;a href=&quot;http://www.amazon.com/Practical-Unit-Testing-TestNG-Mockito/dp/839348930X&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;“Practical Unit Testing with Testng and Mockito”&lt;/a&gt; by &lt;a href=&quot;http://kaczanowscy.pl/tomek/&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;Tomek Kaczanowski&lt;/a&gt;. A really good book worth reading.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&quot;Concepts&quot;&gt;&lt;a href=&quot;#Concepts&quot; class=&quot;headerlink&quot; title=&quot;Concepts&quot;&gt;&lt;/a&gt;Concepts&lt;/h2&gt;&lt;p&gt;Before we start, let’s visit the concepts of SUT and DOC.&lt;/p&gt;
    
    </summary>
    
      <category term="Engineering" scheme="http://fwz.github.io/categories/Engineering/"/>
    
      <category term="Engineering Excellence" scheme="http://fwz.github.io/categories/Engineering/Engineering-Excellence/"/>
    
    
      <category term="Unit Test" scheme="http://fwz.github.io/tags/Unit-Test/"/>
    
  </entry>
  
  <entry>
    <title>Linear Algebra in Room Escape</title>
    <link href="http://fwz.github.io/2015/04/12/Linear-Algebra-in-Room-Escape/"/>
    <id>http://fwz.github.io/2015/04/12/Linear-Algebra-in-Room-Escape/</id>
    <published>2015-04-12T04:26:54.000Z</published>
    <updated>2019-05-02T17:20:41.766Z</updated>
    
    <content type="html"><![CDATA[<p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/lights_out_2.png" alt=""></p><p>Last weekend I went to play a reality room escape game with some friends. It’s a lot of fun and we finally escape on time!</p><p>The only thing make it less perfect is that we skip a “very hard” puzzle according to the staff in the room. We spend 1O minutes on it and we could not found an effective way to solve it. </p><p>The game is consisted of a board with 5 rows * 5 columns = 25 lights. Each light is either on or off. Player could switch any light on/off, but switching any light will also switch it’s neighbour on up/down/left/right position at the same time. The goal of this game is for a given status, try to switch some of the lights to make all the lights on.</p><p>You could also refer to this graph for the “switch logic”.</p><p><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/59/LightsOutIllustration.png/320px-LightsOutIllustration.png" alt="Light out example"></p><p>To win the game, For example, if the initial status looks like the following board (O mean an enlighted light and X mean an off light), then we could switch 2 lights on (2,1) and (4,3) to make all the light on. But is there an systematic way to get a solution? </p><a id="more"></a><table><thead><tr><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th></tr></thead><tbody><tr><td>x</td><td>O</td><td>O</td><td>O</td><td>O</td><td></td><td>O</td><td>O</td><td>O</td><td>O</td><td>O</td><td></td><td>O</td><td>O</td><td>O</td><td>O</td><td>O</td></tr><tr><td>x</td><td>x</td><td>O</td><td>O</td><td>O</td><td></td><td>O</td><td>O</td><td>O</td><td>O</td><td>O</td><td></td><td>O</td><td>O</td><td>O</td><td>O</td><td>O</td></tr><tr><td>x</td><td>O</td><td>x</td><td>O</td><td>O</td><td>switch (2,1) =&gt;</td><td>O</td><td>O</td><td>x</td><td>O</td><td>O</td><td>switch (4,3) =&gt;</td><td>O</td><td>O</td><td>O</td><td>O</td><td>O</td></tr><tr><td>O</td><td>x</td><td>x</td><td>x</td><td>O</td><td></td><td>O</td><td>x</td><td>x</td><td>x</td><td>O</td><td></td><td>O</td><td>O</td><td>O</td><td>O</td><td>O</td></tr><tr><td>O</td><td>O</td><td>x</td><td>O</td><td>O</td><td></td><td>O</td><td>O</td><td>x</td><td>O</td><td>O</td><td></td><td>O</td><td>O</td><td>O</td><td>O</td><td>O</td></tr></tbody></table><p>But not all cases are so straight forward. For example:</p><table><thead><tr><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th></tr></thead><tbody><tr><td>x</td><td>O</td><td>O</td><td>O</td><td>O</td><td></td><td>O</td><td>O</td><td>O</td><td>O</td><td>O</td><td></td><td>O</td><td>O</td><td>O</td><td>O</td><td>O</td></tr><tr><td>x</td><td>O</td><td>O</td><td>O</td><td>O</td><td></td><td>O</td><td>O</td><td>O</td><td>O</td><td>O</td><td></td><td>O</td><td>O</td><td>O</td><td>O</td><td>O</td></tr><tr><td>x</td><td>O</td><td>O</td><td>O</td><td>O</td><td>OR</td><td>O</td><td>O</td><td>O</td><td>O</td><td>O</td><td>OR</td><td>O</td><td>O</td><td>O</td><td>O</td><td>O</td></tr><tr><td>O</td><td>O</td><td>O</td><td>O</td><td>O</td><td></td><td>O</td><td>O</td><td>x</td><td>O</td><td>O</td><td></td><td>O</td><td>O</td><td>O</td><td>O</td><td>O</td></tr><tr><td>O</td><td>O</td><td>O</td><td>O</td><td>O</td><td></td><td>O</td><td>O</td><td>x</td><td>O</td><td>O</td><td></td><td>X</td><td>O</td><td>O</td><td>O</td><td>X</td></tr></tbody></table><h2 id="Some-Helpful-Deduction"><a href="#Some-Helpful-Deduction" class="headerlink" title="Some Helpful Deduction"></a>Some Helpful Deduction</h2><p>Before we further illustrate, there are 2 useful tips deduced from the game rule:</p><ol><li>we don’t need to switch a light for more than 1 time.</li><li>the sequence of switching does not matter.</li></ol><h2 id="Solution-1-Brutal-Enumeration"><a href="#Solution-1-Brutal-Enumeration" class="headerlink" title="Solution 1: Brutal Enumeration"></a>Solution 1: Brutal Enumeration</h2><p>We enumerate all possible combination of switches based on the current status. And see whether there is possible solution. Base on the above deduction, this solution have a time complexity of <span>$O(2^{mn})$</span><!-- Has MathJax -->, where m * n is the total number of lights.<br>Calculation will be effective if m,n &lt; 5. But for a square grid with m = n = 6, it’s already quite a long time for a laptop.</p><h2 id="Solution-2-Linear-Algebra-Way"><a href="#Solution-2-Linear-Algebra-Way" class="headerlink" title="Solution 2: Linear Algebra Way"></a>Solution 2: Linear Algebra Way</h2><p>The following linear algebra approach is a more systematic way to solve it. In the following illustration, I will use a 3*3 grid for demonstration, after we have understand it, we could extend it to size with any size.</p><p>First of all, the Lights status are represented with matrix <em>L</em>. Here 1 means the lights are on, 0 for off.<br>\[ L = \begin{vmatrix}<br>0 &amp; 1 &amp; 0 \\<br>1 &amp; 1 &amp; 0 \\<br>0 &amp; 1 &amp; 1<br>\end{vmatrix} \]</p><p>To make all the lights on, we should toggle some lights to generate effects of<br>\[ \overline{L} = \begin{vmatrix}<br>1 &amp; 0 &amp; 1 \\<br>0 &amp; 0 &amp; 1 \\<br>1 &amp; 0 &amp; 0<br>\end{vmatrix} \]</p><p>start from <strong>0</strong> matrix.</p><p>The action of the switch placed at (i,j) can be interpreted as the matrix <span>${A}_{ij}$</span><!-- Has MathJax --> , where <span>$A_{ij}$</span><!-- Has MathJax --> is the matrix in which the only entries equal to 1 are those placed at (i,j) and in the adjacent positions; there are essentially three types of matrices <span>${A}_{ij}$</span><!-- Has MathJax -->, for different types of position (corner, edge, internal):</p><span>$${A}_{ij}=\begin{cases}\begin{vmatrix}1 & 1 & 0 \\1 & 0 & 0 \\0 & 0 & 0\end{vmatrix}& \text{if i,j refer to top-left corner light}\\\begin{vmatrix}1 & 1 & 1 \\0 & 1 & 0 \\0 & 0 & 0\end{vmatrix}& \text{if i,j refer to top-middle light}\\\begin{vmatrix}0 & 1 & 0 \\1 & 1 & 1 \\0 & 1 & 0\end{vmatrix}& \text{if i,j refer to internal light}\\\text{...}\\\end{cases}$$</span><!-- Has MathJax --><p>Every winning combination of moves can be expressed mathematically in the form:</p><span>$$\sum_{i,j} {x_{ij} } { {A}_{ij} } = \overline{L}$$</span><!-- Has MathJax --><p>each coefficient $x_{ij}$ represents the number of times that switch (i,j) has to be pressed. According to our previous deduction, it could be only 1 or 0. And if we flatten the <span>${A}_{ij}$</span><!-- Has MathJax --> into a vector, then we get the following equation:</p><span>$$\begin{vmatrix}1 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\1 & 1 & 1 & 0 & 1 & 0 & 0 & 0 & 0 \\0 & 1 & 1 & 0 & 0 & 1 & 0 & 0 & 0 \\1 & 0 & 0 & 1 & 1 & 0 & 1 & 0 & 0 \\0 & 1 & 0 & 1 & 1 & 1 & 0 & 1 & 0 \\0 & 0 & 1 & 0 & 1 & 1 & 0 & 0 & 1 \\0 & 0 & 0 & 1 & 0 & 0 & 1 & 1 & 0 \\0 & 0 & 0 & 0 & 1 & 0 & 1 & 1 & 1 \\0 & 0 & 0 & 0 & 0 & 1 & 0 & 1 & 1 \end{vmatrix} \begin{vmatrix}{x_{11}} \\{x_{12}} \\{x_{13}} \\{x_{21}} \\{x_{22}} \\{x_{23}} \\{x_{31}} \\{x_{32}} \\{x_{33}} \\\end{vmatrix}=\begin{vmatrix}1 \\0 \\1 \\0 \\0 \\1 \\1 \\0 \\0\end{vmatrix}$$</span><!-- Has MathJax --><p>Pretty good, <strong> But how could we solve this equation? </strong> It’s not a traditional linear equation, it’s based on an (mod 2) operation or what we call “XOR”. But the basic idea is the same, we just redefine the operation like “add”, “multiply” and then compute the equation.</p><p>There is a lot of solution given in different languages, let’s take a look at a python version provided by <a href="github.com/pmneila">pmneila</a>. I add some comment in the code for a (hopefully) easier understanding.</p><figure class="highlight python"><figcaption><span>lightout_solver.py</span><a href="https://github.com/pmneila/Lights-Out/blob/master/lightsout.py" target="_blank" rel="noopener">lightsout.py</a></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br><span class="line">137</span><br><span class="line">138</span><br><span class="line">139</span><br><span class="line">140</span><br><span class="line">141</span><br><span class="line">142</span><br><span class="line">143</span><br><span class="line">144</span><br><span class="line">145</span><br><span class="line">146</span><br><span class="line">147</span><br><span class="line">148</span><br><span class="line">149</span><br><span class="line">150</span><br><span class="line">151</span><br><span class="line">152</span><br><span class="line">153</span><br><span class="line">154</span><br><span class="line">155</span><br><span class="line">156</span><br><span class="line">157</span><br><span class="line">158</span><br><span class="line">159</span><br><span class="line">160</span><br><span class="line">161</span><br><span class="line">162</span><br><span class="line">163</span><br><span class="line">164</span><br><span class="line">165</span><br><span class="line">166</span><br><span class="line">167</span><br><span class="line">168</span><br><span class="line">169</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># coding: utf-8</span></span><br><span class="line"></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"><span class="string">The following code based on</span></span><br><span class="line"><span class="string">https://github.com/pmneila/Lights-Out/blob/master/lightsout.py</span></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> operator <span class="keyword">import</span> add</span><br><span class="line"><span class="keyword">from</span> itertools <span class="keyword">import</span> chain, combinations</span><br><span class="line"><span class="keyword">import</span> numpy <span class="keyword">as</span> np</span><br><span class="line"><span class="keyword">from</span> scipy <span class="keyword">import</span> ndimage</span><br><span class="line"></span><br><span class="line"><span class="comment"># First, we define operation in Galois Field (https://en.wikipedia.org/wiki/Finite_field)</span></span><br><span class="line"><span class="comment"># Here what we need is an Z/2Z (A mod 2 Galois Fields)</span></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">GF2</span><span class="params">(object)</span>:</span></span><br><span class="line">    <span class="string">"""Galois field GF(2)."""</span></span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, a=<span class="number">0</span>)</span>:</span></span><br><span class="line">        self.value = int(a) &amp; <span class="number">1</span></span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__add__</span><span class="params">(self, rhs)</span>:</span></span><br><span class="line">        <span class="keyword">return</span> GF2(self.value + GF2(rhs).value)</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__mul__</span><span class="params">(self, rhs)</span>:</span></span><br><span class="line">        <span class="keyword">return</span> GF2(self.value * GF2(rhs).value)</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__sub__</span><span class="params">(self, rhs)</span>:</span></span><br><span class="line">        <span class="keyword">return</span> GF2(self.value - GF2(rhs).value)</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__div__</span><span class="params">(self, rhs)</span>:</span></span><br><span class="line">        <span class="keyword">return</span> GF2(self.value / GF2(rhs).value)</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__repr__</span><span class="params">(self)</span>:</span></span><br><span class="line">        <span class="keyword">return</span> str(self.value)</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__eq__</span><span class="params">(self, rhs)</span>:</span></span><br><span class="line">        <span class="keyword">if</span> isinstance(rhs, GF2):</span><br><span class="line">            <span class="keyword">return</span> self.value == rhs.value</span><br><span class="line">        <span class="keyword">return</span> self.value == rhs</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__le__</span><span class="params">(self, rhs)</span>:</span></span><br><span class="line">        <span class="keyword">if</span> isinstance(rhs, GF2):</span><br><span class="line">            <span class="keyword">return</span> self.value &lt;= rhs.value</span><br><span class="line">        <span class="keyword">return</span> self.value &lt;= rhs</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__lt__</span><span class="params">(self, rhs)</span>:</span></span><br><span class="line">        <span class="keyword">if</span> isinstance(rhs, GF2):</span><br><span class="line">            <span class="keyword">return</span> self.value &lt; rhs.value</span><br><span class="line">        <span class="keyword">return</span> self.value &lt; rhs</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__int__</span><span class="params">(self)</span>:</span></span><br><span class="line">        <span class="keyword">return</span> self.value</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__long__</span><span class="params">(self)</span>:</span></span><br><span class="line">        <span class="keyword">return</span> self.value</span><br><span class="line"></span><br><span class="line"><span class="comment"># Encapsulate operation for vectorization computation</span></span><br><span class="line">GF2array = np.vectorize(GF2)</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">gjel</span><span class="params">(A)</span>:</span></span><br><span class="line">    <span class="string">"""Gauss-Jordan elimination."""</span></span><br><span class="line">    nulldim = <span class="number">0</span></span><br><span class="line">    <span class="keyword">for</span> i <span class="keyword">in</span> xrange(len(A)):</span><br><span class="line">        pivot = A[i:,i].argmax() + i</span><br><span class="line">        <span class="keyword">if</span> A[pivot,i] == <span class="number">0</span>:</span><br><span class="line">            nulldim = len(A) - i</span><br><span class="line">            <span class="keyword">break</span></span><br><span class="line">        row = A[pivot] / A[pivot,i]</span><br><span class="line">        A[pivot] = A[i]</span><br><span class="line">        A[i] = row</span><br><span class="line"></span><br><span class="line">        <span class="keyword">for</span> j <span class="keyword">in</span> xrange(len(A)):</span><br><span class="line">            <span class="keyword">if</span> j == i:</span><br><span class="line">                <span class="keyword">continue</span></span><br><span class="line">            A[j] -= row*A[j,i]</span><br><span class="line">    <span class="keyword">return</span> A, nulldim</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">GF2inv</span><span class="params">(A)</span>:</span></span><br><span class="line">    <span class="string">"""Inversion(逆) and eigenvectors(特征向量) of the null-space of a GF2 matrix."""</span></span><br><span class="line">    n = len(A)</span><br><span class="line">    <span class="keyword">assert</span> n == A.shape[<span class="number">1</span>], <span class="string">"Matrix must be square"</span></span><br><span class="line"></span><br><span class="line">    A = np.hstack([A, np.eye(n)])</span><br><span class="line"></span><br><span class="line">    B, nulldim = gjel(GF2array(A))</span><br><span class="line"></span><br><span class="line">    inverse = np.int_(B[-n:, -n:])</span><br><span class="line">    E = B[:n, :n]</span><br><span class="line">    null_vectors = []</span><br><span class="line">    <span class="keyword">if</span> nulldim &gt; <span class="number">0</span>:</span><br><span class="line">        null_vectors = E[:, -nulldim:]</span><br><span class="line">        null_vectors[-nulldim:, :] = GF2array(np.eye(nulldim))</span><br><span class="line">        null_vectors = np.int_(null_vectors.T)</span><br><span class="line"></span><br><span class="line">    <span class="keyword">print</span> <span class="string">"inverse of matrix:"</span></span><br><span class="line">    <span class="keyword">print</span> inverse</span><br><span class="line">    <span class="keyword">print</span> <span class="string">"eigenvectors of matrix:"</span></span><br><span class="line">    <span class="keyword">print</span> null_vectors</span><br><span class="line">    <span class="keyword">return</span> inverse, null_vectors</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">powerset</span><span class="params">(iterable)</span>:</span></span><br><span class="line">    <span class="string">"powerset([1,2,3]) --&gt; () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"</span></span><br><span class="line">    s = list(iterable)</span><br><span class="line">    <span class="keyword">return</span> chain.from_iterable(combinations(s, r) <span class="keyword">for</span> r <span class="keyword">in</span> range(len(s)+<span class="number">1</span>))</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">lightsoutbase</span><span class="params">(n)</span>:</span></span><br><span class="line">    <span class="comment"># base of the lights out problem of size (n, n)</span></span><br><span class="line">    a = np.eye(n*n)</span><br><span class="line">    a = np.reshape(a, (n*n, n, n))</span><br><span class="line"></span><br><span class="line">    <span class="comment"># construct the A_&#123;ij&#125; Matrix</span></span><br><span class="line">    a = np.array(map(ndimage.binary_dilation, a))</span><br><span class="line">    lo_base = np.reshape(a, (n*n, n*n))</span><br><span class="line">    <span class="keyword">return</span> lo_base</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">LightsOut</span><span class="params">(object)</span>:</span></span><br><span class="line">    <span class="string">"""Lights-Out solver."""</span></span><br><span class="line">    </span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, size=<span class="number">5</span>)</span>:</span></span><br><span class="line">        self.n = size</span><br><span class="line">        self.base = lightsoutbase(self.n)</span><br><span class="line">        self.invbase, self.null_vectors = GF2inv(self.base)</span><br><span class="line">    </span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">solve</span><span class="params">(self, b)</span>:</span></span><br><span class="line">        b = np.asarray(b)</span><br><span class="line">        <span class="keyword">assert</span> b.shape[<span class="number">0</span>] == b.shape[<span class="number">1</span>] == self.n, <span class="string">"incompatible shape"</span></span><br><span class="line">        </span><br><span class="line">        <span class="keyword">if</span> <span class="keyword">not</span> self.issolvable(b):</span><br><span class="line">            <span class="keyword">raise</span> ValueError, <span class="string">"The given setup is not solvable"</span></span><br><span class="line">        </span><br><span class="line">        <span class="comment"># Find the base solution.</span></span><br><span class="line">        first = np.dot(self.invbase, b.ravel()) &amp; <span class="number">1</span></span><br><span class="line">        </span><br><span class="line">        <span class="comment"># Given a solution, we can find more valid solutions</span></span><br><span class="line">        <span class="comment"># adding any combination of the null vectors.</span></span><br><span class="line">        <span class="comment"># Find the solution with the minimum number of 1's.</span></span><br><span class="line">        solutions = [(first + reduce(add, nvs, <span class="number">0</span>))&amp;<span class="number">1</span> <span class="keyword">for</span> nvs <span class="keyword">in</span> powerset(self.null_vectors)]</span><br><span class="line">        final = min(solutions, key=<span class="keyword">lambda</span> x: x.sum())</span><br><span class="line">        <span class="keyword">return</span> np.reshape(final, (self.n,self.n))</span><br><span class="line">    </span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">issolvable</span><span class="params">(self, b)</span>:</span></span><br><span class="line">        <span class="string">"""Determine if the given configuration is solvable.</span></span><br><span class="line"><span class="string">        </span></span><br><span class="line"><span class="string">        A configuration is solvable if it is orthogonal to</span></span><br><span class="line"><span class="string">        the null vectors of the base.</span></span><br><span class="line"><span class="string">        """</span></span><br><span class="line">        b = np.asarray(b)</span><br><span class="line">        <span class="keyword">assert</span> b.shape[<span class="number">0</span>] == b.shape[<span class="number">1</span>] == self.n, <span class="string">"incompatible shape"</span></span><br><span class="line">        b = b.ravel()</span><br><span class="line">        p = map(<span class="keyword">lambda</span> x: np.dot(x,b)&amp;<span class="number">1</span>, self.null_vectors)</span><br><span class="line">        <span class="keyword">return</span> <span class="keyword">not</span> any(p)</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">"__main__"</span>:</span><br><span class="line"></span><br><span class="line">    b = np.array([[<span class="number">1</span>,<span class="number">0</span>,<span class="number">1</span>],</span><br><span class="line">                  [<span class="number">0</span>,<span class="number">0</span>,<span class="number">1</span>],</span><br><span class="line">                  [<span class="number">1</span>,<span class="number">0</span>,<span class="number">0</span>]])</span><br><span class="line">    <span class="keyword">print</span> <span class="string">"\nlights status:"</span></span><br><span class="line">    <span class="keyword">print</span> b</span><br><span class="line"></span><br><span class="line">    lo = LightsOut(<span class="number">3</span>)</span><br><span class="line"></span><br><span class="line">    bsol = lo.solve(b)</span><br><span class="line"></span><br><span class="line">    <span class="keyword">print</span> <span class="string">"\nThe solution of"</span></span><br><span class="line">    <span class="keyword">print</span> b</span><br><span class="line">    <span class="keyword">print</span> <span class="string">"is"</span></span><br><span class="line">    <span class="keyword">print</span> bsol</span><br><span class="line"></span><br></pre></td></tr></table></figure><p>run it and it give output like:<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line">$ python lightsout_solver.py</span><br><span class="line"></span><br><span class="line">lights status:</span><br><span class="line">[[1 0 1]</span><br><span class="line"> [0 0 1]</span><br><span class="line"> [1 0 0]]</span><br><span class="line">inverse of matrix:</span><br><span class="line">[[1 0 1 0 0 1 1 1 0]</span><br><span class="line"> [0 0 0 0 1 0 1 1 1]</span><br><span class="line"> [1 0 1 1 0 0 0 1 1]</span><br><span class="line"> [0 0 1 0 1 1 0 0 1]</span><br><span class="line"> [0 1 0 1 1 1 0 1 0]</span><br><span class="line"> [1 0 0 1 1 0 1 0 0]</span><br><span class="line"> [1 1 0 0 0 1 1 0 1]</span><br><span class="line"> [1 1 1 0 1 0 0 0 0]</span><br><span class="line"> [0 1 1 1 0 0 1 0 1]]</span><br><span class="line">eigenvectors of matrix:</span><br><span class="line">[]</span><br><span class="line"></span><br><span class="line">The solution of</span><br><span class="line">[[1 0 1]</span><br><span class="line"> [0 0 1]</span><br><span class="line"> [1 0 0]]</span><br><span class="line">is</span><br><span class="line">[[0 1 0]</span><br><span class="line"> [0 1 0]</span><br><span class="line"> [1 0 0]]</span><br><span class="line"></span><br></pre></td></tr></table></figure></p><p>To solve this equation to get those coefficent <span>$x_{ij} = 1$</span><!-- Has MathJax --> , we get a solution <span>$[x_{12}, x_{22}, x_{31}]$</span><!-- Has MathJax -->.</p><table><thead><tr><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th><th></th></tr></thead><tbody><tr><td>X</td><td>O</td><td>X</td><td></td><td>O</td><td>X</td><td>O</td><td></td><td>O</td><td>O</td><td>O</td><td></td><td>O</td><td>O</td><td>O</td></tr><tr><td>O</td><td>O</td><td>X</td><td>switch (1,2) =&gt;</td><td>O</td><td>X</td><td>X</td><td>switch (2,2) =&gt;</td><td>X</td><td>O</td><td>O</td><td>switch (3,1) =&gt;</td><td>O</td><td>O</td><td>O</td></tr><tr><td>X</td><td>O</td><td>O</td><td></td><td>X</td><td>O</td><td>O</td><td></td><td>X</td><td>X</td><td>O</td><td></td><td>O</td><td>O</td><td>O</td></tr></tbody></table><h2 id="Solution-3-Light-Chasing"><a href="#Solution-3-Light-Chasing" class="headerlink" title="Solution 3: Light Chasing"></a>Solution 3: Light Chasing</h2><p>The above solution is effective, but it’s also crazy to ask someone to solve a equation with 25 parameters in a Room Escape Game. The following approach is easy to follow and to get the solution. The Principle of this solution is “normalize different boards into several solvable boards, and solved them with known strategies”. </p><p>Here is how it proceed:</p><ol><li>rows are manipulated one at a time starting with the top row. All the lights are turned on in the row by toggling the adjacent lights in the next row. </li><li>apply the same method on 2-4 row.</li><li>The last row is solved separately, depending on its active lights. Corresponding lights (see table below) in the top row are toggled and the initial algorithm is run again, resulting in a solution.</li></ol><table><thead><tr><th>Bottom row</th><th>switch Top row</th></tr></thead><tbody><tr><td>XOOOX</td><td>XXOOO</td></tr><tr><td>OXOXO</td><td>XOOXO</td></tr><tr><td>XXXOO</td><td>OXOOO</td></tr><tr><td>OOXXX</td><td>OOOXO</td></tr><tr><td>XOXXO</td><td>OOOOX</td></tr><tr><td>OXXOX</td><td>XOOOO</td></tr><tr><td>XXOXX</td><td>OOXOO</td></tr></tbody></table><p>This approach always lead to an solution (if there is any) for a 5 * 5 Grid. This might be the ideal solution when trapped in the game :).</p>]]></content>
    
    <summary type="html">
    
      &lt;p&gt;&lt;img src=&quot;https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/lights_out_2.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Last weekend I went to play a reality room escape game with some friends. It’s a lot of fun and we finally escape on time!&lt;/p&gt;
&lt;p&gt;The only thing make it less perfect is that we skip a “very hard” puzzle according to the staff in the room. We spend 1O minutes on it and we could not found an effective way to solve it. &lt;/p&gt;
&lt;p&gt;The game is consisted of a board with 5 rows * 5 columns = 25 lights. Each light is either on or off. Player could switch any light on/off, but switching any light will also switch it’s neighbour on up/down/left/right position at the same time. The goal of this game is for a given status, try to switch some of the lights to make all the lights on.&lt;/p&gt;
&lt;p&gt;You could also refer to this graph for the “switch logic”.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://upload.wikimedia.org/wikipedia/commons/thumb/5/59/LightsOutIllustration.png/320px-LightsOutIllustration.png&quot; alt=&quot;Light out example&quot;&gt;&lt;/p&gt;
&lt;p&gt;To win the game, For example, if the initial status looks like the following board (O mean an enlighted light and X mean an off light), then we could switch 2 lights on (2,1) and (4,3) to make all the light on. But is there an systematic way to get a solution? &lt;/p&gt;
    
    </summary>
    
      <category term="Game" scheme="http://fwz.github.io/categories/Game/"/>
    
      <category term="Room Escape" scheme="http://fwz.github.io/categories/Game/Room-Escape/"/>
    
    
      <category term="Linear Algebra" scheme="http://fwz.github.io/tags/Linear-Algebra/"/>
    
  </entry>
  
  <entry>
    <title>Evolution of Metric System Architecture</title>
    <link href="http://fwz.github.io/2015/03/26/Evolution-of-Metric-System-Architecture/"/>
    <id>http://fwz.github.io/2015/03/26/Evolution-of-Metric-System-Architecture/</id>
    <published>2015-03-26T06:31:12.000Z</published>
    <updated>2019-05-02T17:20:41.751Z</updated>
    
    <content type="html"><![CDATA[<p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/data_system.png" alt=""></p><h2 id="Preface"><a href="#Preface" class="headerlink" title="Preface"></a>Preface</h2><p>In the past 2 years, I spent about 70% of my working time to build, to break, and to fix data products. This article is a brief retrospect of my understanding on building the whole systems, as well as what kind of tools could be plugged as components.</p><h3 id="Goal-of-a-data-System"><a href="#Goal-of-a-data-System" class="headerlink" title="Goal of a data System"></a>Goal of a data System</h3><p>We use data to understand reality and improve our product. This is the primary goal of a data/metric system. A good data system answers question, a better data system identifies root causes, and an even better data system help improve the whole system directly. </p><h3 id="Use-cases"><a href="#Use-cases" class="headerlink" title="Use cases"></a>Use cases</h3><p>In Yahoo!, the data platform I am working on mainly support a Personalization System (Recommendation system). During the iteration of the recom system, we follow and forecast what would be the actual use cases for the team to understand or to improve the Recom system. The major use cases for our system includes: </p><ul><li>Understand system performance with reports from different key metrics</li><li>Detect / identify metric abnormal / data pipeline failure</li><li>Collect user feedback data to improve system online in short cycle</li><li>Make it easy for PM/Dev/Scientist to play with data</li></ul><p>For different stage, we focus on different aspect and use different tools / techniques to solve problems. Let me illustrate.</p><a id="more"></a><hr><h2 id="Stage-1-System-Validation"><a href="#Stage-1-System-Validation" class="headerlink" title="Stage 1: System Validation"></a>Stage 1: System Validation</h2><p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/Dashboard Evolution.001.png" alt=""><br>In this Stage, both the recom system and the metric system are in prototype status. As data team, our top priority is to use data to identify whether the recom system work as expect, which means we care about our services more than our actual user at this stage. So we apply a “scraper”, mocking thousands of different queries to visit our backend system. Then extract the instrumentation we are interested in, and compare with our design  using different user profile to generate  And the above graph show a scraper pattern.</p><ol><li>We build a scraper to send multiple mock requests</li><li>Analysis the statistical result from response of mock requests</li><li>Write the statistical result to a local MySQL.</li><li>The front end of data product call MySQL directly to get data and render</li></ol><hr><h2 id="Stage-2-Report-System-Performance-Metrics"><a href="#Stage-2-Report-System-Performance-Metrics" class="headerlink" title="Stage 2: Report System Performance Metrics"></a>Stage 2: Report System Performance Metrics</h2><p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/Dashboard Evolution.002.png" alt=""></p><p>Now we start to care more about user, we want to know how user in different segments interact with our business product. Thus we collect a user behaviour log (which is not produced from our side), and compute metrics like Retention Rate and CTR(click through rate) to measure user’s engagement.</p><p>Before starting to expand our system, let’s review our use cases: “Understand system performance with reports from different key metrics”. Two estimation should be considered: volume of data &amp; latency of report. </p><p>For volume of data, we should be aware of the “dimension explosion” effect. For example, we might have a user location segment/dimension when reporting our DAU metrics, if the location are split by nations, maybe 50 times of volume are needed, if the location are split by city, then the number will be much scary. This is only 1 dimension. Think about combination of different dimensions. The segment might based on age of the user, location of the user, login status of the user, the A/B test id, etc. the combination of dimensions will soon explode to a number you might not imagine before. So here we have to come up a solution of scalability.</p><p>While volume of data might vary, latency of report should be the same. Instant response is our goal. A low latency should be a must have features.</p><p>When interpret them into a design goal, they should be:</p><ul><li><strong>Highly scalable</strong></li><li><strong>Low latency on read queries</strong></li></ul><h4 id="Storage"><a href="#Storage" class="headerlink" title="Storage"></a>Storage</h4><p>First of all, using Hadoop is natural,</p><ul><li>User behaviour logs (TB / daily) are aggregated on HDFS</li><li>Hadoop is still a very good playground for such data manipulation.</li></ul><p>Considering the data volume / latency factor, We select <a href="hbase.apache.org">HBase</a> in our data storage layer.</p><ul><li>Scalable when data volume increase</li><li>Friendly integration with HDFS and other Hadoop projects</li><li>Good (enough) latency on range query</li><li>No relation query use case in predictable future</li></ul><p>Some alternative might have their shortcomings, such as Hive could not provide instant response, and it take many maintenance effort to scale MySQL and keep data in sync.</p><p>After stack selection done, we should figure out a general schema design. In most report system, schema cover “metric id/name”, “dimensions”, “values”. Since HBase is a rowkey based KV database, so this is also about designing the construction of rowkey, to make it represent “metric_id/name” and “dimensions”, and be <strong>backward compatible</strong>.</p><p>And this is our rowkey schema:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">hash|metric_name^dim1_val^dim2_val^...|timestamp</span><br></pre></td></tr></table></figure><p>This schema provide following features:</p><ul><li><strong>Load Balance</strong>. Hash is used as a load balanced technique by HBase, generated from md5 of “metric_name” + “dimensions”. This could guarantee that for the same combination of metrics and dimensions, they could fall into the same region server indicated by the hash value.</li><li>Scanning operation is also efficient, since rows are sorted by row key. So same metrics with same dimensions are clustered and sorted by the timestamp. Given a time range then we could define a start row and end row to scan the rows between.</li><li><strong>Backward compatibility</strong>. Note that not all metrics share the same dimension. A shared configuration is used to store rowkey definition for different metrics. Dimensions are added sequentially in rowkey. For example, for metric M with dimension P, Q, the key looks like “M^p^q”. when adding a dimension S,  the key looks like “M^p^q^s”. However, sometimes we want to ignore dimension S in the report. At this time, trailing empty dimension will be omitted in rowkey. We will still get “M^p^q”, which is backward compatible with the rowkey without the new dimension S.</li></ul><h4 id="Data-Pipeline"><a href="#Data-Pipeline" class="headerlink" title="Data Pipeline"></a>Data Pipeline</h4><p>Most data pipelines are actually taking ETL operation against data. In our cases, ETL against logs mainly output aggregations number on limited fields. A more reasonable and natural choice on Hadoop is <a href="https://pig.apache.org/" target="_blank" rel="noopener">Pig</a> instead of <a href="https://hive.apache.org/" target="_blank" rel="noopener">Hive</a>. See more from <a href="https://developer.yahoo.com/blogs/hadoop/comparing-pig-latin-sql-constructing-data-processing-pipelines-444.html" target="_blank" rel="noopener">Alan Gates’ summary</a>. With the support of UDF (which could be written in Python/Ruby/JS), constructing an ETL data pipeline is effective.</p><h4 id="Scheduler"><a href="#Scheduler" class="headerlink" title="Scheduler"></a>Scheduler</h4><p>Another tools we need is job scheduler. If we want to generate regular daily report, the most straight-forward way is to start a cronjob to run the report generation pipeline regularly. But in real world, a pipeline might have external dependencies. In our case, we have to wait till the user log is available then we could start. Sure we can write a loop in the crontab to wait, or trigger this job with a reasonable delay? But how long should we wait? How could I start my pipeline once the dependency is ready? what if we have multiple dependencies?</p><p>Besides, the pipeline topology should be taken cared. A pipeline in our case cover at least 2 steps: metrics generation and persistance. Each one is a different job. We should also trigger persistance job or fail the pipeline once the generation job succeed / failed.</p><p><a href="oozie.apache.org/">Apache Oozie</a> came out and it save us tons of efforts. On many cases, we use it as a data-trigger, when all dependency data is ready, trigger a series of jobs.</p><h4 id="Data-Product-Serving"><a href="#Data-Product-Serving" class="headerlink" title="Data Product Serving"></a>Data Product Serving</h4><p>To secure our backend data and as a more regularized way to manage data, we implement a Serving layer using Tomcat. Since most operation happened in data products are READ operation, we mainly focus on RPS of Serving. For a internal report system, this layer could be very light weighted.</p><h4 id="Frontend"><a href="#Frontend" class="headerlink" title="Frontend"></a>Frontend</h4><p>We made our FE more user-friendly by leveraging <a href="getbootstrap.com/">Bootstrap</a> (For page Layout/CSS), <a href="www.highcharts.com/">Highchart</a> (For the charting module). <a href="https://nodejs.org/" target="_blank" rel="noopener">Node.js</a> is used to communicate with Serving.</p><h2 id="Stage-3-Expand-Ecosystem"><a href="#Stage-3-Expand-Ecosystem" class="headerlink" title="Stage 3: Expand Ecosystem"></a>Stage 3: Expand Ecosystem</h2><p>As the number of reporting metrics increase, some other problems / requirements emerged. We integrate more component to solve real world problems such as monitoring, metric self-service, online machine learning.<br><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/Dashboard Evolution.003.png" alt=""></p><h3 id="Abnormal-Detection"><a href="#Abnormal-Detection" class="headerlink" title="Abnormal Detection"></a>Abnormal Detection</h3><p>As number of pipelines increases, probability of a broken pipeline also increases.</p><p>First of all, data dependencies matters. Here data dependencies is actually a topology dependency, which mean pipeline A need pipeline B’s output as input. thus A depends on B. When pipeline B breaks or delays, pipeline A will also get blocked. Thus we get metrics delayed. For time sensitive metrics, detecting such issues become more important.</p><p>Secondly, we’d like to detect abnormal value in metrics. For example, if Ads CTR become 0 after a release, we definitely want to get some notifications immediately instead of knowing revenue of the past week are blank one week later.</p><p>So a detection system is implemented. For metric delay issue, we label each pipeline with expected running time, then detection system system build data dependencies graph according to configs and source code of data pipeline. Then we scan the intermediate data on HDFS to see whether each pipeline have produce output. </p><p>In this way, we have enough information to get status of each pipelines. If delay occurs, we will be able to know why.</p><p>To detect metric abnormally, we also define our metric detectors, which use configurable parameters to control detect algorithm. With a series of metrics data as input, could apply numerous algorithm to detect metric abnormal.</p><p>One surprise finding is that, using a fixed threshold is one of our best friend if quality of source data could not be guaranteed. (To be illustrated)</p><h3 id="Ad-hoc-Query"><a href="#Ad-hoc-Query" class="headerlink" title="Ad-hoc Query"></a>Ad-hoc Query</h3><p>Now we have more and more data, people wish we could help answer their questions using our existing data, QUICKLY. With our previous architecture, such requirement / enquiry often require us to build and launch a new data pipeline (even not regularly), because the enquiry  could not be computed from our existing metrics. It is not cost-effective for following reasons: 1. it take human resources to finish such a task. 2. communication cost is un-imaginable big especially for people in remote office. </p><p>To make good use of data and reduce engineering efforts, we decided to build a data-warehouse: Store atomic data for both metric computation and ad-hoc query. Since the enquiry are mostly relational (Such as “what are the top 100 publishers (get most clicks) among mid-age user?”), we use Hive(<a href="https://hive.apache.org/" target="_blank" rel="noopener">https://hive.apache.org/</a>) to build our data warehouse. With Support of HCatalog(<a href="http://hortonworks.com/hadoop/hcatalog/)" target="_blank" rel="noopener">http://hortonworks.com/hadoop/hcatalog/)</a>, Pig and Other MapReduce could easily access / operate data in Hive.</p><p>Also an user interface is necessary for a data warehouse, currently we are using our own UI, and will switch to Hue(<a href="http://gethue.com/" target="_blank" rel="noopener">http://gethue.com/</a>) in near future.</p><h3 id="Logging-amp-Model-Feedback"><a href="#Logging-amp-Model-Feedback" class="headerlink" title="Logging &amp; Model Feedback"></a>Logging &amp; Model Feedback</h3><p>Now also think about the essence of our system, data. As we could see, we collect only the user feedback data from the UI layer. For a recommendation system, logging runtime intermediate results will be helpful to build ML model when combined with the user feedback data. But it’s not clever to return these results to front-end (User UI) because it will increase latency so the data should be logged at server side and send to somewhere (in our case Hadoop). Also the Server has it’s own duty, so this logging pipeline should be as light-weight as possible. </p><p>We finally choose <a href="http://flume.apache.org/" target="_blank" rel="noopener">Flume</a> as our logging framework. For each server, a Flume client is added and it help we send data to HDFS via memory &amp; sockets. No Disk I/O is needed.</p><p>Once we have both logs from server and user, we are able to join them together and get labeled training data, thus we could run ML training pipelines to generate new model. After necessary validation, we could then upload the model for recommendation server.</p><p>Also, a handy way to check error log is helpful for debugging issues. We deploy a <a href="www.splunk.com/">Splunk</a> instance to collect system log, and provide an real-time search UI to the team member. So engineer could easily check what types of error happened for each machine with a unified way instead of logging in to each machine remotely.</p><h2 id="Lesson-Learned"><a href="#Lesson-Learned" class="headerlink" title="Lesson Learned"></a>Lesson Learned</h2><h3 id="Simplify-the-Metrics"><a href="#Simplify-the-Metrics" class="headerlink" title="Simplify the Metrics"></a>Simplify the Metrics</h3><p>Complicated metrics make itself hard to understand, hard to compare, and error-prone in the computation stage. Also, try to limited the number of metrics to make important decision, we could seldom make a decision when some number is encouraging us to move forward when some saying NO if there is too many metrics for optimization goal.</p><h3 id="Manage-Metrics-Life-Cycles"><a href="#Manage-Metrics-Life-Cycles" class="headerlink" title="Manage Metrics Life Cycles"></a>Manage Metrics Life Cycles</h3><p>When a metrics is no more used, retire it. It takes resource to maintain a metric/pipelines.  It’s much easier to retire a metrics than to deprecate existing code: just stop the pipeline. With version control (or even better continuous integration) support, we could restart it very soon in with a simple click.</p><p>One way to identify which metric should be retired is “Don’t ask”, trust numbers. People are afraid of losing existing property. So when ask “May I retire these metrics”, we get “please don’t” for most cases. In our team, we setup a service to collect the logs to identify how many times the metrics have been requested by user, determine a threshold to filter some candidates, stop the pipeline with or without enquiry. Do it monthly.</p><h3 id="Define-Project-Goal-Clearly"><a href="#Define-Project-Goal-Clearly" class="headerlink" title="Define Project Goal Clearly"></a>Define Project Goal Clearly</h3><ul><li>Who will be our major user? Is it an internal tool? Or is it for business partner or real user? </li><li>How long this projects suppose to support? How long the data suppose to support?<br>The answer greatly impact our future decision of design and resources allocation.</li></ul><h3 id="Understand-and-Clean-data"><a href="#Understand-and-Clean-data" class="headerlink" title="Understand and Clean data"></a>Understand and Clean data</h3><ul><li>Equip data expert so won’t get lost when facing data quality problem</li><li>Manage dimension explosion. We might get more data rather than what we expect because of some special cases. Try to remove/lower the impact given by low quality data. For example, filter out low counted aggregation record.</li></ul><p>Finally, always ask questions. When something is weird, ask. When you think something is interesting, ask. When we ask more, we get much more.</p>]]></content>
    
    <summary type="html">
    
      &lt;p&gt;&lt;img src=&quot;https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/data_system.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;Preface&quot;&gt;&lt;a href=&quot;#Preface&quot; class=&quot;headerlink&quot; title=&quot;Preface&quot;&gt;&lt;/a&gt;Preface&lt;/h2&gt;&lt;p&gt;In the past 2 years, I spent about 70% of my working time to build, to break, and to fix data products. This article is a brief retrospect of my understanding on building the whole systems, as well as what kind of tools could be plugged as components.&lt;/p&gt;
&lt;h3 id=&quot;Goal-of-a-data-System&quot;&gt;&lt;a href=&quot;#Goal-of-a-data-System&quot; class=&quot;headerlink&quot; title=&quot;Goal of a data System&quot;&gt;&lt;/a&gt;Goal of a data System&lt;/h3&gt;&lt;p&gt;We use data to understand reality and improve our product. This is the primary goal of a data/metric system. A good data system answers question, a better data system identifies root causes, and an even better data system help improve the whole system directly. &lt;/p&gt;
&lt;h3 id=&quot;Use-cases&quot;&gt;&lt;a href=&quot;#Use-cases&quot; class=&quot;headerlink&quot; title=&quot;Use cases&quot;&gt;&lt;/a&gt;Use cases&lt;/h3&gt;&lt;p&gt;In Yahoo!, the data platform I am working on mainly support a Personalization System (Recommendation system). During the iteration of the recom system, we follow and forecast what would be the actual use cases for the team to understand or to improve the Recom system. The major use cases for our system includes: &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Understand system performance with reports from different key metrics&lt;/li&gt;
&lt;li&gt;Detect / identify metric abnormal / data pipeline failure&lt;/li&gt;
&lt;li&gt;Collect user feedback data to improve system online in short cycle&lt;/li&gt;
&lt;li&gt;Make it easy for PM/Dev/Scientist to play with data&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For different stage, we focus on different aspect and use different tools / techniques to solve problems. Let me illustrate.&lt;/p&gt;
    
    </summary>
    
      <category term="Engineering" scheme="http://fwz.github.io/categories/Engineering/"/>
    
      <category term="Big Data" scheme="http://fwz.github.io/categories/Engineering/Big-Data/"/>
    
    
      <category term="Hive" scheme="http://fwz.github.io/tags/Hive/"/>
    
      <category term="Pig" scheme="http://fwz.github.io/tags/Pig/"/>
    
      <category term="Hadoop" scheme="http://fwz.github.io/tags/Hadoop/"/>
    
      <category term="HBase" scheme="http://fwz.github.io/tags/HBase/"/>
    
      <category term="Oozie" scheme="http://fwz.github.io/tags/Oozie/"/>
    
      <category term="Flume" scheme="http://fwz.github.io/tags/Flume/"/>
    
      <category term="Splunk" scheme="http://fwz.github.io/tags/Splunk/"/>
    
  </entry>
  
  <entry>
    <title>Resistance:Avalon 体验</title>
    <link href="http://fwz.github.io/2015/02/22/ResistanceAvalon%E4%BD%93%E9%AA%8C/"/>
    <id>http://fwz.github.io/2015/02/22/ResistanceAvalon体验/</id>
    <published>2015-02-22T12:19:52.000Z</published>
    <updated>2019-05-02T17:20:41.782Z</updated>
    
    <content type="html"><![CDATA[<p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/avalon.jpg" alt=""></p><p>过年约了几个兄弟姐妹去玩桌游，大师Neo带了几个桌游过来，玩了一回合菲力猫，人到齐了以后玩了两局，，有赢有输，超级好玩。</p><p>总结一下玩了几盘以后的经验：</p><ul><li>最大信息量的是投票阶段。需要记住当前的派票情况，每个人的表决是否和之前阐述理由的时候观点一致，以进行下一步的推理。多通过玩家的举动而非言语来判断玩家的身份。</li><li>推理身份的时候需要前提，假如在当前前提下，无法合理推出当前发生的事实（投票和任务结果），表明预设的玩家身份前提很可能错误。</li><li>除非特殊情况，能参与任务却投反对的很有可能是正义方，因为他觉得有邪恶方人物参与任务。但也不排除邪恶方觉得肯定无法通过投票，通过投反对进行掩饰将分配拖入下一轮。<a id="more"></a></li><li>正义方：推理出来谁是正义一方（或者梅林）就要坚定的跳出来保他。保对了除了更有可能拿到好人的票以外，还能保护梅林（因为梅林和邪恶方都知道场上的方分布，能混淆邪恶方）<br>不过一旦推错了，反贼就知道你不是梅林，没法挡刀。不过再厉害一点，梅林可以通过错保邪恶角色来掩饰自己的身份。忠臣是最考验推理的角色，很好玩。</li><li>邪恶方：一定要声称自己是好人，然后不要考虑自己要怎么拿票，一定要站在从正义方的角度进行推理，注意语气和用词。过程中总会有人跳出来质疑，从质疑的声音中找到梅林。当然投票的时候该怎么投怎么投。邪恶方考验的是伪装，隐藏和陷害。</li><li>梅林：伪装。等到有一两个人开始集火邪恶方势力的时候，再开始加入。同时，想办法找到帕西维尔。</li></ul><h2 id="附录：简单版规则"><a href="#附录：简单版规则" class="headerlink" title="附录：简单版规则"></a>附录：简单版规则</h2><p>游戏分为两方：正义方和邪恶方。一次游戏流程内有5个任务，每个任务必然有一方胜利。</p><h3 id="游戏胜利条件："><a href="#游戏胜利条件：" class="headerlink" title="游戏胜利条件："></a>游戏胜利条件：</h3><p>正义方内有特殊角色梅林（merlyn）和忠臣，正方赢得3个任务，而且反方无法识别出梅林就可以胜利。<br>邪恶方赢得3个任务后立即胜利，或者在正方赢得3个任务后，正确识别出梅林也可以胜利。</p><h3 id="任务胜利条件："><a href="#任务胜利条件：" class="headerlink" title="任务胜利条件："></a>任务胜利条件：</h3><p>每个任务中，所有玩家轮流担任“领袖”，领袖将任务参与权利（每个任务需要的参与人随着游戏进行不断改变）进行初始分配，然后大家进行讨论，然后领袖将任务参与权利进行最后分配，所有玩家对这次分配进行表决，若没有过半数以上同意分配方案，则此玩家担任领袖失败，由下一玩家担任领袖重新开始此轮任务（但任务并没有失败）。若半数以上同意分配方案则开始进行任务。每个任务中，假如没有人破坏任务，则任务成功，假如有邪恶势力混入并投出破坏票，则任务失败（人数增多以后有特殊规则，允许某次任务混入一个邪恶势力）。</p><h3 id="七人局特殊人物："><a href="#七人局特殊人物：" class="headerlink" title="七人局特殊人物："></a>七人局特殊人物：</h3><p>帕西维尔：正义方，知道哪两个人是梅林和莫甘娜，但不知道谁是谁。七人局会多一个邪恶方，所以需要帕西维尔给梅林挡刀，梅林也要找出帕西维尔。<br>莫甘娜：邪恶方，主要作用是混淆帕西维尔。</p>]]></content>
    
    <summary type="html">
    
      &lt;p&gt;&lt;img src=&quot;https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/avalon.jpg&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;过年约了几个兄弟姐妹去玩桌游，大师Neo带了几个桌游过来，玩了一回合菲力猫，人到齐了以后玩了两局，，有赢有输，超级好玩。&lt;/p&gt;
&lt;p&gt;总结一下玩了几盘以后的经验：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;最大信息量的是投票阶段。需要记住当前的派票情况，每个人的表决是否和之前阐述理由的时候观点一致，以进行下一步的推理。多通过玩家的举动而非言语来判断玩家的身份。&lt;/li&gt;
&lt;li&gt;推理身份的时候需要前提，假如在当前前提下，无法合理推出当前发生的事实（投票和任务结果），表明预设的玩家身份前提很可能错误。&lt;/li&gt;
&lt;li&gt;除非特殊情况，能参与任务却投反对的很有可能是正义方，因为他觉得有邪恶方人物参与任务。但也不排除邪恶方觉得肯定无法通过投票，通过投反对进行掩饰将分配拖入下一轮。
    
    </summary>
    
      <category term="Game" scheme="http://fwz.github.io/categories/Game/"/>
    
      <category term="Board Game" scheme="http://fwz.github.io/categories/Game/Board-Game/"/>
    
    
      <category term="Avalon" scheme="http://fwz.github.io/tags/Avalon/"/>
    
  </entry>
  
  <entry>
    <title>Git: under the basics</title>
    <link href="http://fwz.github.io/2014/11/01/Git-under-the-basics/"/>
    <id>http://fwz.github.io/2014/11/01/Git-under-the-basics/</id>
    <published>2014-11-01T12:04:28.000Z</published>
    <updated>2019-05-02T17:20:41.760Z</updated>
    
    <content type="html"><![CDATA[<p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/git-workflow-text.png" alt=""><br>My boss told me that my goal in this quarter is to working on Continuous Integration for our current product, and all of a sudden I think there’s a lot of gap between the goal and my current skill. The first thing came into my mind is that: “Ohhh, I am still not quite familiar with Git”. After a short period of panic, I sit down to learn about git. And here’s my note.</p><p>If you think you could learn git with manual after you learn how to branch, commit and merge, then you might probably be dispointted. Git is very flexible but it do something in a more novel way, so certain understanding of it’s internal is necessary for mastering it, and would be helpful when you look for help in manual. For example, I hear about so many terms such as “HEAD”, “Index”, “Ref”, “Staging Area”, but I could not tell exactly what is that, and I don’t even know how git works. After some diving, I wrapped something very basic in this post.</p><a id="more"></a><hr><h1 id="Terms"><a href="#Terms" class="headerlink" title="Terms"></a>Terms</h1><h3 id="HEAD"><a href="#HEAD" class="headerlink" title="HEAD"></a>HEAD</h3><p>The first one is “HEAD”. It could be described as any of following:</p><ul><li>The symbolic name for commit you’re working on top of.</li><li>Always points to the most recent commit of the checkouted branch.</li><li>Is the parent of your next commit.</li></ul><p>When you commit, the status of current branch is altered and this change is visible through HEAD.</p><p>To Navigate from HEAD, we could use <code>^</code> notation and <code>~</code> notation</p><ul><li>HEAD^ -&gt; the commit before HEAD</li><li>HEAD~{Number} -&gt; the {Number}th commit before HEAD</li></ul><p>If we navigate to a git repo and cat the <code>.git/HEAD</code> file, the content is telling us that we need to look at the file refs/heads/master in the .git directory to find out where HEAD points.</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">$ cat .git/HEAD</span><br><span class="line">ref: refs/heads/master</span><br><span class="line">$ cat .git/refs/heads/master</span><br><span class="line">9c65d51b2c8f405debdf9b100505f814981e8940</span><br></pre></td></tr></table></figure><p>And the <code>.git/refs/heads/{CURRENT_BRANCH}</code> points to the last commit of this branch.</p><p>After we switch the branch, we could see that the HEAD is now pointing to another branch.</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">$ git checkout hotfix</span><br><span class="line">Switched to branch &apos;hotfix&apos;</span><br><span class="line">$ cat .git/HEAD</span><br><span class="line">ref: refs/heads/hotfix</span><br></pre></td></tr></table></figure><p>Most git commands which make changes to the working tree will start by changing HEAD.</p><hr><p>The second term is index.</p><h3 id="Index"><a href="#Index" class="headerlink" title="Index"></a>Index</h3><ul><li>Index – where you place files you want committed to the git repository.</li><li>Alias<ul><li>Cache</li><li>Directory cache</li><li>Staging area</li><li>Staged files</li></ul></li></ul><p>After we <code>git add &lt;file&gt;</code> a file, it’s in the index. See the light blue box in the following workflow chart. </p><p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/git-regular-workflow.png" alt="git-workflow"></p><hr><h3 id="Ref"><a href="#Ref" class="headerlink" title="Ref"></a>Ref</h3><ul><li>Ref is not mysterial, it is just a reference (pointer) to a commit/tag/branch.</li></ul><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">$ git show-ref</span><br><span class="line">beb8ee5d9a968842a8ec3a5a689b2e993ef02e40 refs/heads/master</span><br><span class="line">15eca7bbc33f148cd0072cdc0ff10011951bb98a refs/remotes/origin/master</span><br><span class="line">15eca7bbc33f148cd0072cdc0ff10011951bb98a refs/remotes/origin/HEAD</span><br><span class="line">3518ffa48c41a7a1a1a670975e501b8eeae259ea refs/stash</span><br><span class="line">7beec4dfaa7b0af8b8a8c4120ad782327049404f refs/tags/ci-trunk-0.1</span><br></pre></td></tr></table></figure><p>We could use <code>git cat-file -p</code> to get the content of a pretty-printed reference.<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">$ git cat-file -p refs/heads/master</span><br><span class="line">tree 5a20d43b086432fbee9775a1a1f042523733b807</span><br><span class="line">parent 29b9e11cf64b8a1901341be6a20899ec3bd306ca</span><br><span class="line">author wenzhong &lt;wenzhong@example.com&gt; 1413274718 +0800</span><br><span class="line">committer wenzhong &lt;wenzhong@example.com&gt; 1413274718 +0800</span><br><span class="line"></span><br><span class="line">remove p13nsingal checking pipeline in bundle</span><br></pre></td></tr></table></figure></p><hr><h2 id="How-Git-store-history"><a href="#How-Git-store-history" class="headerlink" title="How Git store history?"></a>How Git store history?</h2><p>OK, It’s time to take a look at how git work.</p><p>The secret lies on the “.git” directory, and the “objects” sub directory store all objects and history of this repository.</p><table><thead><tr><th>name</th><th>Usage </th></tr></thead><tbody><tr><td> .git/HEAD</td><td>file, Point to current branch </td></tr><tr><td> .git/index</td><td>file, store staging area info </td></tr><tr><td> .git/refs</td><td>directory, store pointers points to commit</td></tr><tr><td> .git/objects</td><td>directory, store all data </td></tr></tbody></table><p>For a newly initialized repo, the <code>objects</code> directory looks like this<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">$ tree .git/objects</span><br><span class="line">.git/objects</span><br><span class="line">├─ info</span><br><span class="line">└─ pack</span><br></pre></td></tr></table></figure></p><p>After I do a simple commit, the <code>objects</code> directory are now with 3 new objects added.</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">$ echo &quot;test object v1&quot; &gt; README.md</span><br><span class="line">$ git add README.md</span><br><span class="line">$ git commit -m &quot;initial commit&quot;</span><br><span class="line">$ tree .git/objects</span><br><span class="line">.git/objects</span><br><span class="line">├─ ab</span><br><span class="line">│  └ b08c95ed3c6e5623f0e5b49bcdff0cbac74d4a</span><br><span class="line">├─ c0</span><br><span class="line">│  └ baa8366339e7e0d2e8a1f4d2a6b70e38ce9164</span><br><span class="line">├─ ef</span><br><span class="line">│  └ 0f5c785b315ad24cbd5997b67090fc71b7c5ce</span><br><span class="line">├─ info</span><br><span class="line">└─ pack</span><br></pre></td></tr></table></figure><p>So, what is this? There’s mainly 3 types of objects in git – “Commit”, “Tree” and “Blob”.</p><ul><li>A <em>BLOB</em> is a file under a version</li><li>A <em>TREE</em> is a directory, including blobs and sub-tree (sub-dir) under this dir.</li><li>A <em>COMMIT</em> will point to the repository tree it based on, and also contain commit info (author,message)</li></ul><p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/git-objects.png" alt="Git objects"></p><p><code>cat-file</code> is our friend. Let’s examine the object with it. The “-t” parameters will tell us the type of this objects.</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">$ git cat-file -t c0baa8366339e7e0d2e8a1f4d2a6b70e38ce9164</span><br><span class="line">commit</span><br><span class="line">$ git cat-file -p c0baa8366339e7e0d2e8a1f4d2a6b70e38ce9164</span><br><span class="line">tree ef0f5c785b315ad24cbd5997b67090fc71b7c5ce</span><br><span class="line">author wenzhong &lt;example@gmail.com&gt; 1413733136 +0800</span><br><span class="line">committer wenzhong &lt;example@gmail.com&gt; 1413733136 +0800</span><br><span class="line"></span><br><span class="line">initial commit</span><br></pre></td></tr></table></figure><p>It’s a commit and contain a tree object “ef0f….”.</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">$ git cat-file -t ef0f5c785b315ad24cbd5997b67090fc71b7c5ce</span><br><span class="line">tree</span><br><span class="line">$ git cat-file -p ef0f5c785b315ad24cbd5997b67090fc71b7c5ce</span><br><span class="line">100644 blob abb08c95ed3c6e5623f0e5b49bcdff0cbac74d4a    README.md</span><br></pre></td></tr></table></figure><p>The tree object now contain a blob object.</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">$ git cat-file -t abb08c95ed3c6e5623f0e5b49bcdff0cbac74d4a</span><br><span class="line">blob</span><br><span class="line">$ git cat-file -p abb08c95ed3c6e5623f0e5b49bcdff0cbac74d4a</span><br><span class="line">test object v1</span><br></pre></td></tr></table></figure><p>Then, how about multiple commits? how each commit know which commit it based on?<br>There would be a parent pointer pointing to the last commit in each commit object.<br><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/git-commit-parents.png" alt="Multiple commits"></p><p>Now we should have a basic understanding about how git store our history in the <code>.git/objects</code>.</p><hr><h3 id="SHA1-digest"><a href="#SHA1-digest" class="headerlink" title="SHA1 digest"></a>SHA1 digest</h3><ul><li>In Git, objects are named / located via its SHA1-digest.</li><li>SHA1 will generate a 160 bit Byte array</li><li>Object <strong>ab</strong>b08c95ed3c6e5623f0e5b49bcdff0cbac74d4a will be sent to the “ab” directory </li></ul><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">.git/objects</span><br><span class="line">├─ ab</span><br><span class="line">│  └ b08c95ed3c6e5623f0e5b49bcdff0cbac74d4a</span><br></pre></td></tr></table></figure><h3 id="What-about-SHA1-collision"><a href="#What-about-SHA1-collision" class="headerlink" title="What about SHA1 collision"></a>What about SHA1 collision</h3><blockquote><p>really really really damn unlikely<br>– Linus</p></blockquote><ul><li>But if it happens<ul><li>no new object is created.</li><li>commit will ends up pointing to old object.</li><li>could be noticed in <code>git pull</code> or <code>git clone</code> or something might relavant to a tree diff</li><li>Fix it by adding minor comment</li></ul></li></ul><p>Here’s an example to give you an idea of what it would take to get a SHA-1 collision. If all 6.5 billion humans on Earth were programming, and every second, each one was producing code that was the equivalent of the entire Linux kernel history (1 million Git objects) and pushing it into one enormous Git repository, it would take 5 years until that repository contained enough objects to have a 50% probability of a single SHA-1 object collision. A higher probability exists that every member of your programming team will be attacked and killed by wolves in unrelated incidents on the same night.</p><hr><h1 id="Branching"><a href="#Branching" class="headerlink" title="Branching"></a>Branching</h1><h2 id="Branch-is-cheap"><a href="#Branch-is-cheap" class="headerlink" title="Branch is cheap"></a>Branch is cheap</h2><ul><li>Questions: What does “branching is cheap” mean in Git?</li><li>Answer: Switching branch in Git is simply moving a lightweight movable pointer to one of existing commits.</li></ul><p><img src="img/branch.png" alt="branch"></p><h2 id="Create-a-branch"><a href="#Create-a-branch" class="headerlink" title="Create a branch"></a>Create a branch</h2><ul><li><p>When run <code>git branch {name_of_branch}</code>, a few things happen:</p><ul><li>A reference is created to the local branch at: .git/refs/heads/{name_of_branch}. point to the commit of current HEAD points to.<br><img src="img/from-branch-1.png" alt="from"> </li></ul></li><li><p>Switching branch is moving HEAD<br><img src="img/to-branch-2.png" alt="to"></p></li></ul><h2 id="Merge"><a href="#Merge" class="headerlink" title="Merge"></a>Merge</h2><p>From the branch you currently on, use <code>git merge $FROM_BRANCH</code> to merge changes from $FROM_BRANCH.</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">$ git branch master</span><br><span class="line">$ git merge try_branch</span><br><span class="line">$ git <span class="built_in">log</span> --graph --pretty=<span class="string">'%h %s'</span></span><br><span class="line">*   3729056 merge the try_branch branch</span><br><span class="line">|\</span><br><span class="line">| * 802e6ea first commit on try_branch</span><br><span class="line">* | 4292dd0 add example of pretty <span class="built_in">print</span> <span class="keyword">in</span> git <span class="built_in">log</span></span><br><span class="line">* | 986eda3 add useful git <span class="built_in">log</span> options usage</span><br><span class="line">|/</span><br><span class="line">* 9b7dcc9 add some further change</span><br><span class="line">* 6e2494b init commit</span><br></pre></td></tr></table></figure><ul><li>Another useful option to figure out what state your branches are in is to filter output from <code>git branch -v</code> to branches that you have or have not yet merged into the branch you’re currently on. The useful –merged and –no-merged options have been available in Git since version 1.5.6 for this purpose. To see which branches are already merged into the branch you’re on, you can run <code>git branch --merged</code></li></ul><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">$ git branch --merged</span><br><span class="line">* master</span><br><span class="line">  try_branch</span><br></pre></td></tr></table></figure><p>Because I have merged “try_branch” branch, so I see it now. How about this?</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">$ git checkout -b hotfix</span><br><span class="line">$ git branch --no-merged</span><br><span class="line">$</span><br></pre></td></tr></table></figure><p>No branch is un-merged? Why?</p><p>The reason is that we just create a branch and no commit on it, so both the HEAD pointer of master branch and hotfix branch are pointing to the same commit 3729056.</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">$ git branch -v</span><br><span class="line">  hotfix     3729056 merge the try_branch branch</span><br><span class="line">* master     3729056 merge the try_branch branch</span><br><span class="line">  try_branch 802e6ea first commit on try_branch</span><br></pre></td></tr></table></figure><p>now do something on hotfix branch. And rerun the <code>git branch --no-merged</code></p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">$ git commit -a -m <span class="string">"apply a hotfix"</span></span><br><span class="line">[hotfix f7963a5] apply a hotfix</span><br><span class="line"> 1 file changed, 61 insertions(+)</span><br><span class="line">$ git co master</span><br><span class="line">Switched to branch <span class="string">'master'</span></span><br><span class="line">$ git branch --no-merged</span><br><span class="line">hotfix</span><br></pre></td></tr></table></figure><p>Good, now merge the new commit.<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line">$ git merge hotfix</span><br><span class="line">Auto-merging README.md</span><br><span class="line">CONFLICT (content): Merge conflict <span class="keyword">in</span> README.md</span><br><span class="line">Automatic merge failed; fix conflicts and <span class="keyword">then</span> commit the result.</span><br><span class="line">$ vim README.md</span><br><span class="line">$ git add README.md</span><br><span class="line">$ git commit -m <span class="string">"merge hotfix"</span></span><br><span class="line">[master cc87eec] merge hotfix</span><br><span class="line">$  git <span class="built_in">log</span> --graph --pretty=<span class="string">'%h %s'</span></span><br><span class="line">*   cc87eec merge hotfix</span><br><span class="line">|\</span><br><span class="line">| * 75828cc apply another hotfix</span><br><span class="line">| * f7963a5 apply a hotfix</span><br><span class="line">* | ec00aea apply change on master</span><br><span class="line">|/</span><br><span class="line">*   3729056 merge the try_branch branch</span><br><span class="line">|\</span><br><span class="line">| * 802e6ea first commit on try_branch</span><br><span class="line">* | 4292dd0 add example of pretty <span class="built_in">print</span> <span class="keyword">in</span> git <span class="built_in">log</span></span><br><span class="line">* | 986eda3 add useful git <span class="built_in">log</span> options usage</span><br><span class="line">|/</span><br><span class="line">* 9b7dcc9 add some further change</span><br><span class="line">* 6e2494b init commit</span><br></pre></td></tr></table></figure></p><hr><h2 id="Remote-Branches"><a href="#Remote-Branches" class="headerlink" title="Remote Branches"></a>Remote Branches</h2><p>It’s important to remember when you’re doing above that these branches are completely local. When you’re branching and merging, everything is being done only in your Git repository — no server communication is happening.</p><p>This used to cause a few headache. Let’s add a remote repo (in this case, bitbucket.org. Of course we could switch it to github).</p><pre><code>`git remote add origin ssh://git@bitbucket.org/wenzhong/git_learning.git`</code></pre><p>And push all refs to this origin after creating an empty project on bitbucket.</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">$ git push -u origin --all</span><br><span class="line">Warning: Permanently added the RSA host key <span class="keyword">for</span> IP address <span class="string">'131.103.20.168'</span> to the list of known hosts.</span><br><span class="line">Counting objects: 30, <span class="keyword">done</span>.</span><br><span class="line">Delta compression using up to 8 threads.</span><br><span class="line">Compressing objects: 100% (20/20), <span class="keyword">done</span>.</span><br><span class="line">Writing objects: 100% (30/30), 4.44 KiB, <span class="keyword">done</span>.</span><br><span class="line">Total 30 (delta 9), reused 0 (delta 0)</span><br><span class="line">To ssh://git@bitbucket.org/wenzhong/git_learning.git</span><br><span class="line"> * [new branch]      hotfix -&gt; hotfix</span><br><span class="line"> * [new branch]      master -&gt; master</span><br><span class="line"> * [new branch]      try_branch -&gt; try_branch</span><br><span class="line">Branch hotfix <span class="built_in">set</span> up to track remote branch hotfix from origin.</span><br><span class="line">Branch master <span class="built_in">set</span> up to track remote branch master from origin.</span><br><span class="line">Branch try_branch <span class="built_in">set</span> up to track remote branch try_branch from origin.</span><br></pre></td></tr></table></figure><p>The <code>-u</code> parameters here is telling git that, for every branch that is up to date or successfully pushed, add upstream (tracking) reference, so they could be used by argument-less <code>git-pull</code> command</p><p>And let’s pretend there’re other collaborators who clone this code and push some update. I check out this code to another location (how about call my first local repo “repo1” and this new local repo “repo2”?)in my laptop. And add some changes.</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">$ git commit -a -m &quot;some changes&quot;</span><br><span class="line">[master 93965e6] some changes</span><br><span class="line"> 1 file changed, 48 insertions(+), 47 deletions(-)</span><br><span class="line">$ git push origin</span><br><span class="line">To git@bitbucket.org:wenzhong/git_learning.git</span><br><span class="line">   cc87eec..93965e6  master -&gt; master</span><br></pre></td></tr></table></figure><p>Now, there’re 2 different states. 1 state from remote repo (bitbucket.org) and it has been updated by repo2. 1 state from repo1, which is identical with the initial state on remote repo before repo2 push his changes. repo1 know nothing about repo2, now do some change on “repo1” and try to push my work to the remote repo.</p><hr><h3 id="Synchronize-work"><a href="#Synchronize-work" class="headerlink" title="Synchronize work"></a>Synchronize work</h3><p>Now I run <code>git fetch origin</code> from repo1.<br>This command looks up which server origin is (in this case, it’s bitbucket.org), fetches any data from it that I don’t yet have, and updates my local database, moving my origin/master pointer to its new, more up-to-date position.</p><p>At this time, there’re still two branch for repo1 – origin/master, local/master. They are not the same. origin/master include changes from repo2. local/mastera include changes from repo1, they are not pushed yet. That means we get a reference to origin’s master branch locally.</p><p>But now I want to share my work, push it up to the remote. my local branches aren’t automatically synchronized to the remotes I write to – I have to explicitly push the branch.</p><p>Now use <code>git push origin master</code>. Note that <code>master</code> is the branch name of my local branch. You can also use <code>git push origin master:new_master</code> to create a new_master branch on remote “origin”.  Next time, when repo2 fetches from server, they will get a references to where the server’s version of new_master is under the remote branch origin/new_master.</p><pre><code>$ git push origin master:new_master...To ssh://git@bitbucket.org/wenzhong/git_learning.git * [new branch]      master -&gt; new_master</code></pre><p>Now, repo2 can fetch this new branch by <code>git fetch origin</code></p><pre><code>$ git fetch originFrom bitbucket.org:wenzhong/git_learning * [new branch]      new_master -&gt; origin/new_master</code></pre><p>It’s important to note that when you do a fetch that brings down new remote branches, you don’t automatically have local, editable copies of them. In other words, in this case, you don’t have a new new_master branch — you only have an origin/new_master pointer that you can’t modify.</p><pre><code>$ git checkout -b new_master origin/new_masterBranch new_master set up to track remote branch new_master from origin.Switched to a new branch &apos;new_master&apos;</code></pre><p>So far so good. But wait, what happen if I push the repo1/master to origin/master? I assume there would be conflicts.</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">$ git push origin</span><br><span class="line">To ssh://git@bitbucket.org/wenzhong/git_learning.git</span><br><span class="line"> ! [rejected]        master -&gt; master (non-fast-forward)</span><br><span class="line">error: failed to push some refs to <span class="string">'ssh://git@bitbucket.org/wenzhong/git_learning.git'</span></span><br><span class="line">hint: Updates were rejected because the tip of your current branch is behind</span><br><span class="line">hint: its remote counterpart. Merge the remote changes (e.g. <span class="string">'git pull'</span>)</span><br><span class="line">hint: before pushing again.</span><br><span class="line">hint: See the <span class="string">'Note about fast-forwards'</span> <span class="keyword">in</span> <span class="string">'git push --help'</span> <span class="keyword">for</span> details.</span><br></pre></td></tr></table></figure><p>Follow it’s hint and run <code>git pull origin master</code></p><hr><h3 id="Clean-your-change-before-Merging"><a href="#Clean-your-change-before-Merging" class="headerlink" title="Clean your change before Merging"></a>Clean your change before Merging</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">$ git pull origin master</span><br><span class="line">...</span><br><span class="line">From ssh://bitbucket.org/wenzhong/git_learning</span><br><span class="line"> * branch            master     -&gt; FETCH_HEAD</span><br><span class="line">Updating cc87eec..93965e6</span><br><span class="line">error: Your <span class="built_in">local</span> changes to the following files would be overwritten by merge:</span><br><span class="line">    README.md</span><br><span class="line">Please, commit your changes or stash them before you can merge.</span><br></pre></td></tr></table></figure><p>Git require that our repo should be clean before merging remote changes, so local repo will not corrupted.</p><p>See <code>man git-merge</code><br>Warning: Running git merge with uncommitted changes is discouraged:<br>       while possible, it leaves you in a state that is hard to back out of<br>       in the case of a conflict.</p><p>So, let’s commit our current change (or do a <code>git stash</code>) and try to merge changes from origin/master brought by repo2.<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">$ git commit -m <span class="string">"add syncing remote &amp; local repo"</span></span><br><span class="line">[master 205e4ce] add syncing remote &amp; <span class="built_in">local</span> repo</span><br><span class="line"> 1 file changed, 43 insertions(+)</span><br><span class="line">$ git pull origin master</span><br><span class="line">rom ssh://bitbucket.org/wenzhong/git_learning</span><br><span class="line"> * branch            master     -&gt; FETCH_HEAD</span><br><span class="line">Auto-merging README.md</span><br><span class="line">CONFLICT (content): Merge conflict <span class="keyword">in</span> README.md</span><br><span class="line">Automatic merge failed; fix conflicts and <span class="keyword">then</span> commit the result.</span><br></pre></td></tr></table></figure></p><p>That’s expected, remember that in repo2, we move the tip section to the bottom. (Yes, if you check out commit 986eda3, tips are on top of this README file. and repo2 put it to the bottom at commit 93965e6). So resolve conflicts. and <code>git commit -a -m &quot;merge changes from repo2 and apply my fix&quot;</code></p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">$ git push origin master</span><br><span class="line">...</span><br><span class="line">ssh://git@bitbucket.org/wenzhong/git_learning.git</span><br><span class="line">93965e6..586c16f  master -&gt; master</span><br></pre></td></tr></table></figure><p>That’s it.<br>To summarize, in repo1, we:</p><ul><li>Fetch change from origin/master (latest updated by repo2)</li><li>try to automatically merge our local change (but could not)</li><li>merge it locally</li><li>push to origin/master</li></ul><hr><h3 id="Deleting-Remote-Branches"><a href="#Deleting-Remote-Branches" class="headerlink" title="Deleting Remote Branches"></a>Deleting Remote Branches</h3><p>Now I think the branch “new_branch” have finished its duty, and I want to delete it from the remote server. I will use <code>git push origin :new_branch</code>.  A way to remember this command is by recalling the <code>git push [remotename] [local- branch]:[remotebranch]</code> syntax that we went over a bit earlier. If you leave off the <code>[localbranch]</code> portion, then you’re basically saying, “Take nothing on my side and make it be <code>[remotebranch]</code>.” </p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">$ git push origin :new_master</span><br><span class="line">To ssh://git@bitbucket.org/wenzhong/git_learning.git</span><br><span class="line"> - [deleted]         new_master</span><br></pre></td></tr></table></figure><hr><h2 id="Rebasing"><a href="#Rebasing" class="headerlink" title="Rebasing"></a>Rebasing</h2><p>In Git, there are two main ways to integrate changes from one branch into another: the <code>merge</code> and the <code>rebase</code>. </p><ul><li>With the <code>rebase</code> command, we can take all the changes that were committed on one branch and replay them on another one. It works by:<ul><li>going to the <strong>common ancestor of the two branches</strong> (the one you’re on and the one you’re rebasing onto), </li><li>getting the diff introduced by each commit of the branch you’re on, </li><li>saving those diffs to temporary files, </li><li>resetting the current branch to the same commit as the branch you are rebasing onto, </li><li>and finally applying each change in turn.</li></ul></li></ul><hr><h2 id="Cherry-pick"><a href="#Cherry-pick" class="headerlink" title="Cherry-pick"></a>Cherry-pick</h2><p>A more fancy way to merge changes is the <code>git cherry-pick</code>. </p><p>git cherry-pick - Apply the changes introduced by some existing commits</p><p>A typical use case is that you could pick some commits from a dev branch to master branch (not all of them).</p><p>Another use case I could think of is that when tracking a bug, you might add debug info, commit and trigger CI to reproduce problem, and apply fix commit. using cherry-pick then you could only apply don’t have to remove the debug code in your apply fix commit.</p><hr><h2 id="Tagging"><a href="#Tagging" class="headerlink" title="Tagging"></a>Tagging</h2><ul><li>Branches are easy to move around and often refer to different commits as work is completed on them. </li><li><p>Branches are easily mutated, often temporary, and always changing.</p></li><li><p>If that’s the case, you may be wondering if there’s a way to permanently mark historical points in your project’s history. </p></li><li>For things like major releases and big merges</li></ul><p>git-tag - Create, list, delete or verify a tag object signed with GPG by modifying  tag reference in .git/refs/tags/</p><hr><h3 id="Describe-how-far-way-from-you-and-the-tag"><a href="#Describe-how-far-way-from-you-and-the-tag" class="headerlink" title="Describe how far way from you and the tag?"></a>Describe how far way from you and the tag?</h3><ul><li>you could use <code>git describe</code> to Show the most recent tag that is reachable from a commit</li><li>output of <code>git describe</code> will be <code>&lt;tag&gt;_&lt;numCommits&gt;_g&lt;hash&gt;</code>, Where tag is the closest ancestor tag in history, numCommits is how many commits away that tag is, and <hash> is the hash of the commit being described.</hash></li></ul><hr><h2 id="Stash"><a href="#Stash" class="headerlink" title="Stash"></a>Stash</h2><p>Think about</p><ul><li>working on a new feature modifying files in the working directory and/or index</li><li>and you find out you need to fix a bug on a different branch. </li><li>You can’t just switch / create a different branch because it will lose all your work.</li></ul><h3 id="git-stash"><a href="#git-stash" class="headerlink" title="git stash"></a>git stash</h3><ol><li>Saves your working directory and index to a safe place</li><li>Using <code>git stash pop</code> to restores your working directory and index to the most recent commit</li></ol><p>Of course, you could commit your current change, move HEAD to HEAD^, then create a branch. But sometime your current change is not complete as a commit. You don’t want dirty commit added to your repo. <code>git stash</code> give you a clearer way to do this</p><hr><h1 id="Hooks"><a href="#Hooks" class="headerlink" title="Hooks"></a>Hooks</h1><ul><li>Some action pre/post each “git action” could be taken by git hooks</li><li>e.g. pre-commit script would be called before commit, here we run a simple ‘run_test.py’ to run test before actually commit something</li></ul><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#!/usr/bin/env bash</span></span><br><span class="line"><span class="keyword">if</span> git diff-index --quiet HEAD --; <span class="keyword">then</span></span><br><span class="line">    <span class="comment">#no changed between index and working copy; just run tests</span></span><br><span class="line">    bin/run_tests.py</span><br><span class="line">    RESULT=$?</span><br><span class="line"><span class="keyword">else</span></span><br><span class="line">    <span class="comment">#Test the version that's about to be committed</span></span><br><span class="line">    <span class="comment">#stashing all unindexed changes</span></span><br><span class="line">    git stash -q --keep-index</span><br><span class="line">    bin/run_tests.py</span><br><span class="line">    RESULT=$?</span><br><span class="line">    git stash pop -p</span><br><span class="line"><span class="keyword">fi</span></span><br><span class="line">[ <span class="variable">$RESULT</span> -ne 0 ] &amp;&amp; <span class="built_in">exit</span> 1</span><br><span class="line"><span class="built_in">exit</span> 0</span><br></pre></td></tr></table></figure><hr><h1 id="Tips-and-Tricks"><a href="#Tips-and-Tricks" class="headerlink" title="Tips and Tricks"></a>Tips and Tricks</h1><h3 id="Inspect-commits"><a href="#Inspect-commits" class="headerlink" title="Inspect commits"></a>Inspect commits</h3><p>There are many options can be used in <code>git log</code>. Some are extremely useful:</p><ul><li>git log –graph</li><li>git log –stat</li><li>git log –since=”2013-10-01” –before=”2014-03-01” –author=fwz</li></ul><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">git <span class="built_in">log</span> --pretty=<span class="string">"%h - %ad - %an - %s"</span></span><br><span class="line"></span><br><span class="line">986eda3 - Sat Mar 1 20:15:12 2014 +0800 - fwz - add useful git <span class="built_in">log</span> options usage</span><br><span class="line">9b7dcc9 - Sat Mar 1 20:01:06 2014 +0800 - fwz - add some further change</span><br><span class="line">6e2494b - Sat Mar 1 19:44:56 2014 +0800 - fwz - init commit</span><br></pre></td></tr></table></figure><ul><li><code>git show</code> – reports the changes introduced by the most recent commit:</li></ul><h3 id="Auto-completion"><a href="#Auto-completion" class="headerlink" title="Auto completion"></a>Auto completion</h3><ul><li>Git comes with a nice auto-completion script for <strong>Bash</strong> User.  </li><li>Get the latest git-completion.sh from <a href="https://raw.github.com/git/git/master/contrib/completion/git-completion.bash" target="_blank" rel="noopener">Github</a></li><li>Put it in your HOME directory and </li><li>Put <code>source ~/git-completion.sh</code> to source it when you login.</li><li>This also works with options, which is probably more useful.<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">$ git <span class="built_in">log</span> --s&lt;tab&gt;</span><br><span class="line">--shortstat  --since=  --src-prefix=  --<span class="built_in">stat</span>   --summary</span><br></pre></td></tr></table></figure></li></ul><p>if you use zsh,  you can get more surprises.</p><p>For instance, if you’re running a git log command and can’t remember one of the options, you can start typing it and press Tab to see what matches:</p><p>That’s a pretty nice trick and may save you some time and documentation reading.<br>use D3 as an example.</p><h3 id="Aliases"><a href="#Aliases" class="headerlink" title="Aliases"></a>Aliases</h3><p>Git doesn’t infer your command if you type it in partially. If you don’t want to type the entire text of each of the Git commands, you can easily set up an alias for each command using git config. Note: the global settings</p><p>git config global is under ~/.gitconfig. Here’s my simple aliases.</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line">    [user]</span><br><span class="line">        email = wenzhong.work@gmail.com</span><br><span class="line">        name = fwz</span><br><span class="line">    [core]</span><br><span class="line">        editor = /usr/local/bin/vim</span><br><span class="line">        whitespace = trailing-space,space-before-tab</span><br><span class="line">    [alias]</span><br><span class="line">        ci = commit</span><br><span class="line">        co = checkout</span><br><span class="line">        st = status</span><br><span class="line">        br = branch</span><br><span class="line">        unstage = reset HEAD --</span><br><span class="line">        last = log -1 HEAD</span><br><span class="line">    [color]</span><br><span class="line">*       ui = true</span><br></pre></td></tr></table></figure><h3 id="Diff"><a href="#Diff" class="headerlink" title="Diff"></a>Diff</h3><p>git diff can be used to list differences between working tree and index,  or between index and commit,  or between working tree and commit</p><p>working tree : your working directory.</p><p>index file(stage): Files in the git index are files(after git add) that git would commit to the git repository if you used the git commit command. This is a brigde between working tree and commit</p><p>commit: the last stage. after commit,  all changes will checked in git repo.</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">git diff : show the differences between working tree and index file</span><br><span class="line">git diff --cached : show the differences between index file and commit</span><br><span class="line">git diff HEAD: show the didferences between working tree and commit(HEAD means the latest commit)</span><br></pre></td></tr></table></figure><h3 id="Diff-Cont"><a href="#Diff-Cont" class="headerlink" title="Diff Cont."></a>Diff Cont.</h3><p>git diff usually list lots of changes to stdout.  If you want to use less as the default pager,  below is one solution.</p><p>For current project</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git config core.pager &apos;less -r&apos;</span><br></pre></td></tr></table></figure><p>For all projects</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git config --global core.pager &apos;less -r&apos;</span><br></pre></td></tr></table></figure><h3 id="Clone"><a href="#Clone" class="headerlink" title="Clone"></a>Clone</h3><p>If you just want to clone one branch from github,  what you have to do:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">git remote add -t $BRANCH -f origin $REMOTE_REPO</span><br><span class="line">git checkout $BRANCH</span><br></pre></td></tr></table></figure><p>If you jsut want to clone specific commits(say latest commit),  what you have to do:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git clone --depth=1 $REMOTE_REPO</span><br></pre></td></tr></table></figure><h3 id="List-deleted-Add-files"><a href="#List-deleted-Add-files" class="headerlink" title="List deleted/Add files"></a>List deleted/Add files</h3><p>If you want to know when and which commit delete a file,  </p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git log --diff-filter=D --summary</span><br></pre></td></tr></table></figure><p>If you want to know when and which commit add a file,  </p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git log --diff-filter=A --summary</span><br></pre></td></tr></table></figure><h3 id="search-string-from-all-versions-in-git-repos"><a href="#search-string-from-all-versions-in-git-repos" class="headerlink" title="search string from all versions in git repos"></a>search string from all versions in git repos</h3><p>If you want to get  a piece of code, variable, function, file in the repo,  but you can not find it.<br>Maybe it has been deleted for long time. How can I know when and who delete it?</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">git rev-list --all|(</span><br><span class="line">      while read revision;  do</span><br><span class="line">      git grep -F &apos;Your search string&apos; $revision</span><br><span class="line">  done</span><br><span class="line">)</span><br></pre></td></tr></table></figure><h1 id="Useful-Materials"><a href="#Useful-Materials" class="headerlink" title="Useful Materials"></a>Useful Materials</h1><ul><li><a href="http://pcottle.github.io/learnGitBranching/" target="_blank" rel="noopener">Learn Git Branching</a></li><li><a href="https://www.atlassian.com/en/git/tutorial" target="_blank" rel="noopener">Atlassian’s Git Tutorial</a></li><li><a href="http://www.youtube.com/watch?feature=player_detailpage&amp;v=ZDR433b0HJY#t=2791s" target="_blank" rel="noopener">Introduction to Git with Scott Chacon of GitHub</a></li></ul>]]></content>
    
    <summary type="html">
    
      &lt;p&gt;&lt;img src=&quot;https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/git-workflow-text.png&quot; alt=&quot;&quot;&gt;&lt;br&gt;My boss told me that my goal in this quarter is to working on Continuous Integration for our current product, and all of a sudden I think there’s a lot of gap between the goal and my current skill. The first thing came into my mind is that: “Ohhh, I am still not quite familiar with Git”. After a short period of panic, I sit down to learn about git. And here’s my note.&lt;/p&gt;
&lt;p&gt;If you think you could learn git with manual after you learn how to branch, commit and merge, then you might probably be dispointted. Git is very flexible but it do something in a more novel way, so certain understanding of it’s internal is necessary for mastering it, and would be helpful when you look for help in manual. For example, I hear about so many terms such as “HEAD”, “Index”, “Ref”, “Staging Area”, but I could not tell exactly what is that, and I don’t even know how git works. After some diving, I wrapped something very basic in this post.&lt;/p&gt;
    
    </summary>
    
      <category term="Engineering" scheme="http://fwz.github.io/categories/Engineering/"/>
    
      <category term="Version Control" scheme="http://fwz.github.io/categories/Engineering/Version-Control/"/>
    
    
      <category term="Git" scheme="http://fwz.github.io/tags/Git/"/>
    
  </entry>
  
  <entry>
    <title>简约至上：交互式设计四策略 读后感</title>
    <link href="http://fwz.github.io/2014/09/12/%E3%80%90%E8%AF%BB%E4%B9%A6%E7%AC%94%E8%AE%B0%E3%80%91%E7%AE%80%E7%BA%A6%E8%87%B3%E4%B8%8A%EF%BC%9A%E4%BA%A4%E4%BA%92%E5%BC%8F%E8%AE%BE%E8%AE%A1%E5%9B%9B%E7%AD%96%E7%95%A5/"/>
    <id>http://fwz.github.io/2014/09/12/【读书笔记】简约至上：交互式设计四策略/</id>
    <published>2014-09-12T05:30:09.000Z</published>
    <updated>2019-05-02T17:20:41.815Z</updated>
    
    <content type="html"><![CDATA[<p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/Simplicity.png" alt=""></p><p>在重构一个遗留前端系统的时候，我觉得需要有一些指导原则来引领我做设计。正好看到了这本书，摘录一些有益的观点。</p><h2 id="普适观点"><a href="#普适观点" class="headerlink" title="普适观点"></a>普适观点</h2><ul><li>人们喜欢简单、值得信赖、适应性强的产品</li><li>考虑大多数用户的体验，让他们觉得产品井然有序，轻松自在。他们正在掌控着一切。</li><li>改变会产生影响，需要有办法（最好是公式）来衡量究竟是正面影响大，还是负面影响大</li><li>描述你的设计</li><li>使产品的设计符合用户使用产品的环境，要意识到影响用户体验的因素极多</li><li>简单的用户体验是初学者、新手的体验，或者是压力之下的主流用户的体验</li><li>想要实现简单的体验，需要将目标定得极端，这样能保持产品迭代朝着正确方向前进。例如将目标定位“瞬间响应”而不是“快速响应”，这样我们能在开发新功能时时刻提醒自己。</li></ul><h2 id="实现简化的4个策略："><a href="#实现简化的4个策略：" class="headerlink" title="实现简化的4个策略："></a>实现简化的4个策略：</h2><h3 id="删除"><a href="#删除" class="headerlink" title="删除"></a>删除</h3><ul><li>去掉不必要的功能，直到不能再减</li></ul><h3 id="组织"><a href="#组织" class="headerlink" title="组织"></a>组织</h3><ul><li>按照有意义的标准将他们划分成不同的组</li></ul><h3 id="隐藏"><a href="#隐藏" class="headerlink" title="隐藏"></a>隐藏</h3><ul><li>隐藏不是最重要的功能，避免分散用户注意力</li></ul><h3 id="转移"><a href="#转移" class="headerlink" title="转移"></a>转移</h3><ul><li>将复杂性转移到其它地方。例如遥控器保留具备最基本功能的按钮，而将其它控制放到电视屏幕的菜单上</li></ul><h2 id="对待客户需求"><a href="#对待客户需求" class="headerlink" title="对待客户需求"></a>对待客户需求</h2><ul><li>不要简单地因为客户要求就增加功能，应该对用户的要求做逆向工程——搞清楚用户到底遇到的问题是什么，然后仔细斟酌这个问题是不是应该由软件来解决。</li><li>增加功能不一定能让用户体验更简单，反而经常导致更多的迷惑。</li></ul><blockquote><p>Simplicity is the ultimate sophistication<br>                              – Leonardo Da Vinci</p></blockquote>]]></content>
    
    <summary type="html">
    
      
      
        &lt;p&gt;&lt;img src=&quot;https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/img/blog/Simplicity.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;在重构一个遗留前端系统的时候，我觉得需要有一些指导原则来引领我
      
    
    </summary>
    
      <category term="Read &amp; Learn" scheme="http://fwz.github.io/categories/Read-Learn/"/>
    
      <category term="Book Review" scheme="http://fwz.github.io/categories/Read-Learn/Book-Review/"/>
    
    
      <category term="Reading" scheme="http://fwz.github.io/tags/Reading/"/>
    
  </entry>
  
  <entry>
    <title>Sync two Git remote repositories</title>
    <link href="http://fwz.github.io/2014/08/29/syncing-two-remote-repository/"/>
    <id>http://fwz.github.io/2014/08/29/syncing-two-remote-repository/</id>
    <published>2014-08-28T16:03:20.000Z</published>
    <updated>2016-09-05T03:39:54.000Z</updated>
    
    <content type="html"><![CDATA[<p>In Yahoo we use Gerrit as our code review tool. Engineers commit code changes to Gerrit for review. After code has been reviewed by peers, Gerrit help push to Github. However sometimes bad thing happens. </p><p>For example, if a committer forget to setup Gerrit env and the code is committed and pushed to Github directly (because he has this permission to do so), he will notice this soon because future changes from Gerrit might break. Then he try to submit a review for the missing commit to Gerrit with an “git commit –amend” to generate a change-id (which is used by Gerrit). Because “–amend” generate different commit-id, so after the review is passed, even the content is the same, the commit-id in two remote repo (Gerrit and Github) is different, which leads to future reviews are still not able to pushed to Github from Gerrit and sometimes new review could not be submitted. </p><p>So how could we fix it? It would be a good idea to push changes from Github to Gerrit to get them in sync. Here are two options.</p><a id="more"></a><h2 id="first-approach"><a href="#first-approach" class="headerlink" title="first approach"></a>first approach</h2><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">git pull origin master</span><br><span class="line">git push gerrit master --force</span><br></pre></td></tr></table></figure><p>Usually, the command refuses to update a remote ref that is not an ancestor of the local ref used to overwrite it. This flag disables the check. So latest version from “origin” now could be pushed to “gerrit”. By this way, some commit will be lost, but quite straight forward.</p><h2 id="second-approach"><a href="#second-approach" class="headerlink" title="second approach"></a>second approach</h2><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">git reset --hard gerrit/master</span><br><span class="line">git merge origin/master</span><br><span class="line">git push gerrit master</span><br></pre></td></tr></table></figure><p>In this approach, we first make local repo get synced with “gerrit/master”, then merge changes with “origin/master”. Because we merge changes with “origin/master” so the commit-id of this merge are based on the one on “origin”, and both “gerrit” and “origin” are supposed to accept this commit. After we push this merged changes to “gerrit”, “gerrit” is able to push it to “origin”. Then two repo get synced. By this way, no commit will be lost.</p>]]></content>
    
    <summary type="html">
    
      &lt;p&gt;In Yahoo we use Gerrit as our code review tool. Engineers commit code changes to Gerrit for review. After code has been reviewed by peers, Gerrit help push to Github. However sometimes bad thing happens. &lt;/p&gt;
&lt;p&gt;For example, if a committer forget to setup Gerrit env and the code is committed and pushed to Github directly (because he has this permission to do so), he will notice this soon because future changes from Gerrit might break. Then he try to submit a review for the missing commit to Gerrit with an “git commit –amend” to generate a change-id (which is used by Gerrit). Because “–amend” generate different commit-id, so after the review is passed, even the content is the same, the commit-id in two remote repo (Gerrit and Github) is different, which leads to future reviews are still not able to pushed to Github from Gerrit and sometimes new review could not be submitted. &lt;/p&gt;
&lt;p&gt;So how could we fix it? It would be a good idea to push changes from Github to Gerrit to get them in sync. Here are two options.&lt;/p&gt;
    
    </summary>
    
      <category term="Engineering" scheme="http://fwz.github.io/categories/Engineering/"/>
    
      <category term="Version Control" scheme="http://fwz.github.io/categories/Engineering/Version-Control/"/>
    
    
      <category term="Git" scheme="http://fwz.github.io/tags/Git/"/>
    
  </entry>
  
  <entry>
    <title>Recommended workflows in Alfred</title>
    <link href="http://fwz.github.io/2014/07/13/Recommended-workflows-in-Alfred/"/>
    <id>http://fwz.github.io/2014/07/13/Recommended-workflows-in-Alfred/</id>
    <published>2014-07-13T15:06:52.000Z</published>
    <updated>2019-05-02T17:20:41.777Z</updated>
    
    <content type="html"><![CDATA[<p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/alfred_workflow.png" alt=""></p><p>Finally I purchase Alfred for workflows.</p><p>A <a href="http://support.alfredapp.com/workflows" target="_blank" rel="noopener">Workflow</a> is a combination of actions, and the killer feature in Alfred powerpack. In Alfred, we can import existing workflows or create our own workflows – to run a series of actions, which will dramatcially improve productivity.</p><h2 id="My-workflow-lists"><a href="#My-workflow-lists" class="headerlink" title="My workflow lists"></a>My workflow lists</h2><p>Before writing this post, I spent about half an hour on <a href="http://www.alfredworkflow.com/" target="_blank" rel="noopener">Alfred Workflows</a> to go through all existing workflows based on  <a href="https://github.com/hzlzh/AlfredWorkflow.com" target="_blank" rel="noopener">the AlfredWorkflow repo</a> by <a href="https://github.com/hzlzh" target="_blank" rel="noopener">hzlzh</a> and select following workflows as enhancement of Alfred.</p><p>There is <a href="https://github.com/zenorocha/alfred-workflows" target="_blank" rel="noopener">another workflow repo</a> by <a href="https://github.com/zenorocha/" target="_blank" rel="noopener">@zenorocha</a>.</p><p>And here’s my lists.</p><a id="more"></a><table><thead><tr><th>Workflows</th><th>usage</th></tr></thead><tbody><tr><td><a href="http://www.com/wp-admin/admin-ajax.php?action=cfdb-file&amp;s=1364124536.5170&amp;form=Alfred+Workflow&amp;field=workflow_file" target="_blank" rel="noopener">Add-task-to-Things</a></td><td>Use “task” to add task into Things.</td></tr><tr><td><a href="http://www.com/wp-admin/admin-ajax.php?action=cfdb-file&amp;s=1363963308.9955&amp;form=Alfred+Workflow&amp;field=workflow_file" target="_blank" rel="noopener">Adium</a></td><td>Use “im {User}” to search online User in Adium</td></tr><tr><td><a href="http://www.com/wp-admin/admin-ajax.php?action=cfdb-file&amp;s=1367327572.5345&amp;form=Alfred+Workflow&amp;field=workflow_file" target="_blank" rel="noopener">AlfredTweet-2</a></td><td>Use “tweet” to send tweets</td></tr><tr><td><a href="http://www.com/wp-admin/admin-ajax.php?action=cfdb-file&amp;s=1367748340.1045&amp;form=Alfred+Workflow&amp;field=workflow_file" target="_blank" rel="noopener">Baidu-Map</a></td><td>Use “bmap” to locate</td></tr><tr><td><a href="http://www.com/wp-admin/admin-ajax.php?action=cfdb-file&amp;s=1398320736.2336&amp;form=Alfred+Workflow&amp;field=workflow_file" target="_blank" rel="noopener">Baidu_Search</a></td><td>Use “bd” to search on baidu</td></tr><tr><td><a href="http://www.com/wp-admin/admin-ajax.php?action=cfdb-file&amp;s=1366731149.1667&amp;form=Alfred+Workflow&amp;field=workflow_file" target="_blank" rel="noopener">Dash</a></td><td>Use “dash” to search libaray/methods in Dash</td></tr><tr><td><a href="http://www.com/wp-admin/admin-ajax.php?action=cfdb-file&amp;s=1364174860.3977&amp;form=Alfred+Workflow&amp;field=workflow_file" target="_blank" rel="noopener">Dianping</a></td><td>Use “dianping {merchants}” to search merchants</td></tr><tr><td><a href="http://www.com/wp-admin/admin-ajax.php?action=cfdb-file&amp;s=1364049016.0184&amp;form=Alfred+Workflow&amp;field=workflow_file" target="_blank" rel="noopener">Douban</a></td><td>Use “book / movie / music” to search items on douban.com</td></tr><tr><td><a href="http://www.com/wp-admin/admin-ajax.php?action=cfdb-file&amp;s=1364334384.6181&amp;form=Alfred+Workflow&amp;field=workflow_file" target="_blank" rel="noopener">Drop-in-Pocket</a></td><td>Use “pocket {URL}” to save webpage into pocket</td></tr><tr><td><a href="http://www.com/wp-admin/admin-ajax.php?action=cfdb-file&amp;s=1400849236.8030&amp;form=Alfred+Workflow&amp;field=workflow_file" target="_blank" rel="noopener">Eject</a></td><td>Use “eject” to eject all ejectable devices</td></tr><tr><td><a href="http://www.com/wp-admin/admin-ajax.php?action=cfdb-file&amp;s=1364052677.4068&amp;form=Alfred+Workflow&amp;field=workflow_file" target="_blank" rel="noopener">Evernote</a></td><td>Use “en {item}” to search item in evernote. Use “ennew” to create new note in evernote.</td></tr><tr><td><a href="http://www.com/wp-admin/admin-ajax.php?action=cfdb-file&amp;s=1364033722.1626&amp;form=Alfred+Workflow&amp;field=workflow_file" target="_blank" rel="noopener">Kill+Process</a></td><td>Use “kill {process name}” to kill a process</td></tr><tr><td><a href="http://www.com/wp-admin/admin-ajax.php?action=cfdb-file&amp;s=1364886080.2728&amp;form=Alfred+Workflow&amp;field=workflow_file" target="_blank" rel="noopener">Maven</a></td><td>Use “mvn” to find packges in maven</td></tr><tr><td><a href="http://www.com/wp-admin/admin-ajax.php?action=cfdb-file&amp;s=1387390019.3115&amp;form=Alfred+Workflow&amp;field=workflow_file" target="_blank" rel="noopener">NPM-Search</a></td><td>Use “npm” to search node modules</td></tr><tr><td><a href="http://www.com/wp-admin/admin-ajax.php?action=cfdb-file&amp;s=1368972251.0988&amp;form=Alfred+Workflow&amp;field=workflow_file" target="_blank" rel="noopener">Node.js-docs</a></td><td>Use “nodejs” to locate node libraris</td></tr><tr><td><a href="http://www.com/wp-admin/admin-ajax.php?action=cfdb-file&amp;s=1366229643.9780&amp;form=Alfred+Workflow&amp;field=workflow_file" target="_blank" rel="noopener">Open-Airdrop</a></td><td>Use “airdrop” to activate Airdrop</td></tr><tr><td><a href="http://www.com/wp-admin/admin-ajax.php?action=cfdb-file&amp;s=1385112458.2655&amp;form=Alfred+Workflow&amp;field=workflow_file" target="_blank" rel="noopener">pm2.5-alfred</a></td><td>Use “pm2.5 to search pm2.5 index”. (But seems using a restricted API )</td></tr><tr><td><a href="http://www.com/wp-admin/admin-ajax.php?action=cfdb-file&amp;s=1364404552.1464&amp;form=Alfred+Workflow&amp;field=workflow_file" target="_blank" rel="noopener">stackoverflow</a></td><td>Use “st” to search stuffs in stack overflow</td></tr><tr><td><a href="http://www.com/wp-admin/admin-ajax.php?action=cfdb-file&amp;s=1384745053.2139&amp;form=Alfred+Workflow&amp;field=workflow_file" target="_blank" rel="noopener">zhihu</a></td><td>Use “zh” to search quesiton or people in zhihu, Use “zhdaily” to search in zhihu Daily</td></tr><tr><td><a href="https://raw.githubUsercontent.com/hzlzh/AlfredWorkflow.com/master/Downloads/Workflows/%E6%9C%89%E9%81%93%E7%BF%BB%E8%AF%91" target="_blank" rel="noopener">有道翻译</a></td><td>Use “yd {terms}” to search translation</td></tr></tbody></table><h2 id="Install-workflows"><a href="#Install-workflows" class="headerlink" title="Install workflows"></a>Install workflows</h2><p>double click the workflows downloaded.</p><h2 id="Some-other-tricks-on-Alfred"><a href="#Some-other-tricks-on-Alfred" class="headerlink" title="Some other tricks on Alfred"></a>Some other tricks on Alfred</h2><ul><li>Use it as a calculator. e.g. 3*2</li><li>advance calculator could be run started with =. e.g. “=log(3)” or “=sin(1)” </li><li>Use “&gt; {command}” to run command in terminal.</li><li>You could also setup which terminal to use {Terminal/iTerms}</li></ul><p>I will update how many time I have been saved using Alfred later :)</p>]]></content>
    
    <summary type="html">
    
      &lt;p&gt;&lt;img src=&quot;https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/alfred_workflow.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Finally I purchase Alfred for workflows.&lt;/p&gt;
&lt;p&gt;A &lt;a href=&quot;http://support.alfredapp.com/workflows&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;Workflow&lt;/a&gt; is a combination of actions, and the killer feature in Alfred powerpack. In Alfred, we can import existing workflows or create our own workflows – to run a series of actions, which will dramatcially improve productivity.&lt;/p&gt;
&lt;h2 id=&quot;My-workflow-lists&quot;&gt;&lt;a href=&quot;#My-workflow-lists&quot; class=&quot;headerlink&quot; title=&quot;My workflow lists&quot;&gt;&lt;/a&gt;My workflow lists&lt;/h2&gt;&lt;p&gt;Before writing this post, I spent about half an hour on &lt;a href=&quot;http://www.alfredworkflow.com/&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;Alfred Workflows&lt;/a&gt; to go through all existing workflows based on  &lt;a href=&quot;https://github.com/hzlzh/AlfredWorkflow.com&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;the AlfredWorkflow repo&lt;/a&gt; by &lt;a href=&quot;https://github.com/hzlzh&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;hzlzh&lt;/a&gt; and select following workflows as enhancement of Alfred.&lt;/p&gt;
&lt;p&gt;There is &lt;a href=&quot;https://github.com/zenorocha/alfred-workflows&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;another workflow repo&lt;/a&gt; by &lt;a href=&quot;https://github.com/zenorocha/&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;@zenorocha&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;And here’s my lists.&lt;/p&gt;
    
    </summary>
    
      <category term="Engineering" scheme="http://fwz.github.io/categories/Engineering/"/>
    
      <category term="Productivity" scheme="http://fwz.github.io/categories/Engineering/Productivity/"/>
    
    
      <category term="MAC" scheme="http://fwz.github.io/tags/MAC/"/>
    
      <category term="Alfred" scheme="http://fwz.github.io/tags/Alfred/"/>
    
  </entry>
  
  <entry>
    <title>Apache Pig in Practice 1</title>
    <link href="http://fwz.github.io/2014/07/04/Apache-Pig-in-Practice-1/"/>
    <id>http://fwz.github.io/2014/07/04/Apache-Pig-in-Practice-1/</id>
    <published>2014-07-03T17:27:43.000Z</published>
    <updated>2019-05-02T17:20:41.741Z</updated>
    
    <content type="html"><![CDATA[<p><img src="https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/pig1.png" alt=""><br>I write many pig script in the past few months and have explored some tricks with my buddies. hopes it could help someone.</p><p>Let’s focus on some interesting topics in this first article and get prepared for the later Pig rush.</p><h2 id="IDE-amp-Environment"><a href="#IDE-amp-Environment" class="headerlink" title="IDE &amp; Environment"></a>IDE &amp; Environment</h2><h3 id="Vim"><a href="#Vim" class="headerlink" title="Vim"></a>Vim</h3><p>I use Vim to write most script language and those are my favourite plugins to write Pig:</p><ul><li><a href="http://www.vim.org/scripts/script.php?script_id=2186" target="_blank" rel="noopener">Pig Syntax Highlight</a>. Latest update on Jun 2014, Pig 0.12 supported.</li><li><a href="https://github.com/Valloric/YouCompleteMe" target="_blank" rel="noopener">You complete me</a>. Best auto-complete plugins ever. If you don’t use a MAC, <a href="https://github.com/ervandew/supertab" target="_blank" rel="noopener">Supertab</a> is also a reasonable choice.</li><li><a href="https://github.com/godlygeek/tabular" target="_blank" rel="noopener">Tabularize</a> Align and keep cleaness of the Pig codelet. Most common usage is <code>:Tab/AS</code> to align <code>FOREACH ... GENERATE</code> clause.</li></ul><p>To improve debug efficiency, I like to run pig with short cut. Here are my simple approach: add the following in <code>.vimrc</code> for quick run with <code>F5</code><br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">map &lt;F5&gt; :call Compile_Run()&lt;CR&gt;</span><br><span class="line">function Compile_Run()</span><br><span class="line">    if &amp;filetype==&quot;coffee&quot;</span><br><span class="line">        :w</span><br><span class="line">        !coffee % 2&gt;&amp;1</span><br><span class="line">    elseif &amp;filetype==&quot;cpp&quot;</span><br><span class="line">        :w</span><br><span class="line">        !g++ -g -o %&lt; %; ./%&lt;</span><br><span class="line">    elseif &amp;filetype==&quot;python&quot;</span><br><span class="line">        :w</span><br><span class="line">        !python %</span><br><span class="line">    elseif &amp;filetype==&quot;pig&quot;</span><br><span class="line">        :w</span><br><span class="line">        !./run_pig.sh %</span><br></pre></td></tr></table></figure></p><p>revise <code>run_pig.sh</code> as you like. General idea is reduce redundant work and typo.</p><a id="more"></a><p>Basic form would be:<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">pig -x local $1</span><br></pre></td></tr></table></figure></p><p>or with default local debug settings<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">intput=./input.txt</span><br><span class="line">output=./output</span><br><span class="line">rm -rf $&#123;output?&#125;</span><br><span class="line">pig -x local -Dinput=$&#123;input&#125; -Doutput=$&#123;output&#125; $1</span><br></pre></td></tr></table></figure></p><h2 id="Modulize-your-Pig-code-using-Marco"><a href="#Modulize-your-Pig-code-using-Marco" class="headerlink" title="Modulize your Pig code using Marco"></a>Modulize your Pig code using Marco</h2><h3 id="Under-standing-Marco"><a href="#Under-standing-Marco" class="headerlink" title="Under standing Marco"></a>Under standing Marco</h3><p>We have mix feelings with Marco, still I love it better.</p><p>Marco could help organize and reuse your code. Marco in Pig is quite like Marco in C – they do substitution. Think about you want to do the same series of operation with 3 dataset… Try refactor it into a marco, you will absolutely thank your mercy later.</p><p>Understanding the <code>$</code> sign is important when using Marco. <code>$</code> decorate those variable to be replaced</p><figure class="highlight plain"><figcaption><span>number1.txt</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td></tr></table></figure><figure class="highlight plain"><figcaption><span>filter.marco</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">DEFINE filter_small_number (events, threshold) RETURNS filtered_events &#123;</span><br><span class="line">    $filtered_events = FILTER $events BY a &gt; $threshold;</span><br><span class="line">&#125;;</span><br></pre></td></tr></table></figure><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">IMPORT &apos;filter.marco&apos;;</span><br><span class="line"></span><br><span class="line">events = LOAD &apos;./data/number1.txt&apos; AS (a:int);</span><br><span class="line"></span><br><span class="line">big_number = filter_small_number(events, 2);</span><br><span class="line"></span><br><span class="line">DUMP big_number;</span><br></pre></td></tr></table></figure><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">(3)</span><br><span class="line">(4)</span><br><span class="line">(5)</span><br></pre></td></tr></table></figure><p>Easy. </p><p>However, you’d better not change input with in a Marco. they are just substitution, every change in input variables are global </p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">IMPORT &apos;filter.marco&apos;;</span><br><span class="line"></span><br><span class="line">events = LOAD &apos;./data/number1.txt&apos; AS (a:int);</span><br><span class="line"></span><br><span class="line">big_number = filter_small_number(events, 2);</span><br><span class="line"></span><br><span class="line">DUMP big_number;</span><br><span class="line">DUMP events;</span><br></pre></td></tr></table></figure><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">DEFINE filter_small_number (events, threshold) RETURNS filtered_events &#123;</span><br><span class="line">    $filtered_events = FILTER $events BY a &gt; $threshold;</span><br><span class="line">    $events = FILTER $events BY a == 4;</span><br><span class="line">&#125;;</span><br></pre></td></tr></table></figure><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">big_number</span><br><span class="line">(3)</span><br><span class="line">(4)</span><br><span class="line">(5)</span><br><span class="line"></span><br><span class="line">events</span><br><span class="line">(4)</span><br></pre></td></tr></table></figure><p>You can also return multiple data set in Marco<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">DEFINE split_events (events, threshold) RETURNS big, small &#123;</span><br><span class="line">    $big = FILTER $events BY a &gt;= $threshold;</span><br><span class="line">    $small = FILTER $events BY a &lt; $threshold;</span><br><span class="line">&#125;;</span><br><span class="line"></span><br><span class="line">events = LOAD &apos;./data/number1.txt&apos; AS (a:int);</span><br><span class="line"></span><br><span class="line">big_num, small_num = split_events(events, 3);</span><br><span class="line"></span><br><span class="line">DUMP big_num;</span><br><span class="line">DUMP small_num;</span><br><span class="line"></span><br></pre></td></tr></table></figure></p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">big</span><br><span class="line">(3)</span><br><span class="line">(4)</span><br><span class="line">(5)</span><br><span class="line">small</span><br><span class="line">(1)</span><br><span class="line">(2)</span><br></pre></td></tr></table></figure><h3 id="What’s-not-so-cool"><a href="#What’s-not-so-cool" class="headerlink" title="What’s not so cool"></a>What’s not so cool</h3><p>One reason we love Marco less is that after marco is plugined into Pig then error message become a little difficult to read and resolve root cause, because line number would be reflecting the reassembled Pig scripts. However, it’s still a great tool and a must have skill to use Pig.</p><h1 id="INPUT-and-OUTPUT"><a href="#INPUT-and-OUTPUT" class="headerlink" title="INPUT and OUTPUT"></a>INPUT and OUTPUT</h1><h2 id="INPUT"><a href="#INPUT" class="headerlink" title="INPUT"></a>INPUT</h2><ul><li>You can almost load everything, HDFS / Avro / Protobuf / Hive / Elastic Search / MongoDB.</li></ul><h2 id="OUTPUT"><a href="#OUTPUT" class="headerlink" title="OUTPUT"></a>OUTPUT</h2><ul><li>Of course, there are Storage Function in Pair for above persistency / serialization tools.</li><li>MultiStorage could help you store data hierarchily, which mean you could partition result when storing, absolutly must-know features.</li></ul><h1 id="Third-party-Pig-library"><a href="#Third-party-Pig-library" class="headerlink" title="Third party Pig library"></a>Third party Pig library</h1><ul><li><a href="https://cwiki.apache.org/confluence/display/PIG/PiggyBank" target="_blank" rel="noopener">piggybank</a></li><li><a href="http://data.linkedin.com/opensource/datafu" target="_blank" rel="noopener">DataFu</a> from LinkedIn</li><li><a href="https://github.com/twitter/elephant-bird/" target="_blank" rel="noopener">ElephantBird</a> from twitter</li><li><a href="http://hortonworks.com/hadoop/hcatalog/" target="_blank" rel="noopener">Hcatalog</a></li></ul><h1 id="UDF"><a href="#UDF" class="headerlink" title="UDF"></a>UDF</h1><p>Once you know you could use Python/Ruby/JS to write UDF, I suppose nobody will try to use JAVA for common cases.<br><a href="http://pig.apache.org/docs/r0.9.1/udf.html#python-udfs" target="_blank" rel="noopener">Python UDF</a></p><h1 id="Unit-test"><a href="#Unit-test" class="headerlink" title="Unit test"></a>Unit test</h1><h2 id="PigUnit"><a href="#PigUnit" class="headerlink" title="PigUnit"></a>PigUnit</h2><p>Write UT to be a good man. Of course, Pig could and should be unit-tested. The <a href="http://pig.apache.org/docs/r0.8.1/pigunit.html" target="_blank" rel="noopener">PigUnit</a> backbone are supported in Java. However docs are limited and you might run into many troubles.</p><h2 id="Unit-test-a-python-UDF"><a href="#Unit-test-a-python-UDF" class="headerlink" title="Unit test a python UDF"></a>Unit test a python UDF</h2><p>when using native <code>unittest</code> packages to test the python script，<code>outputSchema</code> will complains. One way is to add Pig support in Python script, the other one is to disable the outputSchema notation. Here we should the second tricks, put this codelet at the top of the UDF.</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">if __name__ != &apos;__lib__&apos;: </span><br><span class="line">    def outputSchema(dont_care): </span><br><span class="line">        def wrapper(func): </span><br><span class="line">            def inner(*args, **kwargs): </span><br><span class="line">                return func(*args, **kwargs) </span><br><span class="line">        return inner </span><br><span class="line">    return wrapper </span><br></pre></td></tr></table></figure><p>This block is intended to test the UDF with the outputSchema notation. The <code>__name__</code> will be marked as ‘lib’ when script is call by Pig. So it will not take effect when the script is running as Pig UDF. </p><h1 id="References"><a href="#References" class="headerlink" title="References"></a>References</h1><p><a href="https://developer.yahoo.com/blogs/hadoop/comparing-pig-latin-sql-constructing-data-processing-pipelines-444.html" target="_blank" rel="noopener">Comparing Pig Latin and SQL for Constructing Data Processing Pipelines</a> By Alan Gates, Pig Architect in Yahoo.<br><a href="http://shop.oreilly.com/product/0636920018087.do" target="_blank" rel="noopener">Programming Pig</a> also by Alan Gates.<br><a href="http://www.packtpub.com/pig-design-patterns/book" target="_blank" rel="noopener">Pig Design Pattern</a></p>]]></content>
    
    <summary type="html">
    
      &lt;p&gt;&lt;img src=&quot;https://wenzhong-1259152588.cos.ap-beijing.myqcloud.com/pig1.png&quot; alt=&quot;&quot;&gt;&lt;br&gt;I write many pig script in the past few months and have explored some tricks with my buddies. hopes it could help someone.&lt;/p&gt;
&lt;p&gt;Let’s focus on some interesting topics in this first article and get prepared for the later Pig rush.&lt;/p&gt;
&lt;h2 id=&quot;IDE-amp-Environment&quot;&gt;&lt;a href=&quot;#IDE-amp-Environment&quot; class=&quot;headerlink&quot; title=&quot;IDE &amp;amp; Environment&quot;&gt;&lt;/a&gt;IDE &amp;amp; Environment&lt;/h2&gt;&lt;h3 id=&quot;Vim&quot;&gt;&lt;a href=&quot;#Vim&quot; class=&quot;headerlink&quot; title=&quot;Vim&quot;&gt;&lt;/a&gt;Vim&lt;/h3&gt;&lt;p&gt;I use Vim to write most script language and those are my favourite plugins to write Pig:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://www.vim.org/scripts/script.php?script_id=2186&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;Pig Syntax Highlight&lt;/a&gt;. Latest update on Jun 2014, Pig 0.12 supported.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/Valloric/YouCompleteMe&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;You complete me&lt;/a&gt;. Best auto-complete plugins ever. If you don’t use a MAC, &lt;a href=&quot;https://github.com/ervandew/supertab&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;Supertab&lt;/a&gt; is also a reasonable choice.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/godlygeek/tabular&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;Tabularize&lt;/a&gt; Align and keep cleaness of the Pig codelet. Most common usage is &lt;code&gt;:Tab/AS&lt;/code&gt; to align &lt;code&gt;FOREACH ... GENERATE&lt;/code&gt; clause.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To improve debug efficiency, I like to run pig with short cut. Here are my simple approach: add the following in &lt;code&gt;.vimrc&lt;/code&gt; for quick run with &lt;code&gt;F5&lt;/code&gt;&lt;br&gt;&lt;figure class=&quot;highlight plain&quot;&gt;&lt;table&gt;&lt;tr&gt;&lt;td class=&quot;gutter&quot;&gt;&lt;pre&gt;&lt;span class=&quot;line&quot;&gt;1&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;2&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;3&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;4&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;5&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;6&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;7&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;8&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;9&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;10&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;11&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;12&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;13&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;14&lt;/span&gt;&lt;br&gt;&lt;/pre&gt;&lt;/td&gt;&lt;td class=&quot;code&quot;&gt;&lt;pre&gt;&lt;span class=&quot;line&quot;&gt;map &amp;lt;F5&amp;gt; :call Compile_Run()&amp;lt;CR&amp;gt;&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;function Compile_Run()&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;    if &amp;amp;filetype==&amp;quot;coffee&amp;quot;&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;        :w&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;        !coffee % 2&amp;gt;&amp;amp;1&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;    elseif &amp;amp;filetype==&amp;quot;cpp&amp;quot;&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;        :w&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;        !g++ -g -o %&amp;lt; %; ./%&amp;lt;&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;    elseif &amp;amp;filetype==&amp;quot;python&amp;quot;&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;        :w&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;        !python %&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;    elseif &amp;amp;filetype==&amp;quot;pig&amp;quot;&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;        :w&lt;/span&gt;&lt;br&gt;&lt;span class=&quot;line&quot;&gt;        !./run_pig.sh %&lt;/span&gt;&lt;br&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/figure&gt;&lt;/p&gt;
&lt;p&gt;revise &lt;code&gt;run_pig.sh&lt;/code&gt; as you like. General idea is reduce redundant work and typo.&lt;/p&gt;
    
    </summary>
    
      <category term="Engineering" scheme="http://fwz.github.io/categories/Engineering/"/>
    
      <category term="Big Data" scheme="http://fwz.github.io/categories/Engineering/Big-Data/"/>
    
    
      <category term="Pig" scheme="http://fwz.github.io/tags/Pig/"/>
    
      <category term="Hadoop" scheme="http://fwz.github.io/tags/Hadoop/"/>
    
  </entry>
  
</feed>