Development Retrospective


Background

I made another post because retrospective already has long contents so I extract this post from there.


Development Case Retrospective

Contents

I’ve already included the content below in the retrospective, and now I want to write freely, following the flow of thoughts without any specific format.

  • The importance of generalization and standardization.
  • Traces must be left somewhere.
  • Technology is not everything.
  • Resources cannot be completely trusted.
  • What you worry about can happen someday.
  • Distinguishing between DEV, STG, and PROD is important.
  • Test is very important.
  • Don’t trust people.
  • Even when busy, you must do what’s necessary.
  • Context switching takes a lot of time.
  • The policies of high level organizations and the circumstances at the time.
  • You need to maintain physical strength and stay mentally focused.


The importance of generalization and standardization.

Since joining the Big Data Center of the company, I have been working on projects related to company wide data. One of the most significant realizations I’ve had here is the importance of generalization and standardization. The opposite of these is fragmentation.

As I became part of an enterprise level organization, I got a task which creating a system to collect data across the company. However, before proceeding with this task, I reviewed previous projects and noticed the challenges that had been encountered. A major issue was the lack of consistency data collection methods and formats differed across services, and these discrepancies had not been unified. While reading this, you might ask, “Why?” There were likely reasons for this, even though I may not know all of them. I’ll share more thoughts on this later.

Through my experience, I’ve learned that when standardization and generalization are not achieved, fragmentation occurs. As a service grows, managing and maintaining it becomes increasingly difficult. I believe that standardization and generalization are essential as services expand.

For leaders or managers, it’s not enough to simply complete a project; they should aim to generalize as much as possible to ease the workload for workers and improve quality. If you just adapt entire demands of collaborating parties, it would be a SI not a collaboration. Of course, there might be valid reasons to consider such requests, but in cases where there’s no alternative, I think it’s better to postpone or make adjustments rather than applying everything.

For small services that are easy to manage such as under ten systems, fragmentation may not pose much of an issue. However, when the service exceeds this scale and takes on the characteristics of a platform, generalization and standardization become crucial. In the case of a large platform, I believe it is appropriate for a higher level department to set the direction for generalization and standardization.


Traces must be left somewhere.

If the person who developed a specific feature were to manage it for life, this discussion might not be necessary. However, unlike programs, people can change at any time. This has made me realize firsthand the importance of documenting why certain tasks were done. There are various ways to do this: organizing it in a document, or leaving detailed comments and function names in the code. While some argue that comments are not ideal as they compensate for what code fails to express, if something cannot be conveyed through code, leaving comments is still necessary. This can save time by avoiding the need for a new person to ask, “Why is this done this way?” and having to track down someone to explain. It also allows for handling situations even when the person in charge is unavailable.

Expressing intent as much as possible in the code and documenting it elsewhere when that’s not feasible is crucial for team efficiency.

This year, there were times when I was so busy that such situations occurred occasionally. Having experienced it myself, I’ve now developed a habit of leaving records as much as possible.


Technology is not everything.

This is something I also documented earlier this year in my retrospective: [Retrospective](EN) The first half retrospective). Since there haven’t been any major changes, I’ve brought it over as is.

It’s not all about technical skills. Something that could be resolved in 10 minutes technically might take 10 days due to emotional issues. There was a case that work time becomes longer due to newly created process in service part even though development were done. There was a case that owner told us that they cannot corporate but we asked to high level manager to corporate it. After that issue was solved. It seems that sometimes problems can be resolved not only through development but also by communication with people.


Resources cannot be completely trusted.

It could be due to my lack of experience in developing large scale services, but I’ve noticed a tendency to trust resources while developing. For example, I once assumed there was sufficient memory when writing code, only to encounter an OOM (Out of Memory) error. In another instance, an OOM error occurred because I set the options excessively. Although I’ve used memory as an example, I’ve also faced issues with storage, as well as load related problems on the network or database as the service grows.

For instance, I experienced OOM errors while running Pods and bottlenecks caused by frequent database access in Airflow. There were also cases where Pods crashed because options were set too big during infrastructure setup. Additionally, responses were either too slow or too large, causing Swagger to become unresponsive. As the service scaled, there were several instances where we hit AWS Quotas and had to request increases to AWS.

While there may not be enough bandwidth to worry about memory in the early stages of development, it’s crucial to keep in mind that problems can arise as the service grows. Whenever possible, it’s better to anticipate and prepare for these issues in advance.


What you worry about can happen someday.

There were moments when I thought, “Ah… this could become a problem later,” and those concerns did, in fact, happen.

For instance, there was a part in the cloud that automatically generates resources. Initially, I excluded it more broadly and later modified it. Although there were comments that it seemed oddly configured and it wasn’t documented, the potential issue I had been concerned about actually occurred. In another case, we initially managed infrastructure setup values as a single file. As the service grows, deployments became slower, and management became increasingly difficult. Eventually, a team member took the initiative to revise and improve the system, although it required significant effort.

This may seem obvious and like something everyone already knows, but I thought it was worth emphasizing again. It’s better to address these concerns in advance if possible. Of course, it’s easier said than done in practice… haha. Finding the right balance is, of course, necessary.


Distinguishing between DEV, STG, and PROD is important.

As the number of components I developed increased, I came to realize firsthand the importance of environment.

While testing, there were cases where my work conflicted with parts developed by others, or issues arose during integration testing. These problems occurred because the DEV and STG environments partially overlapped.

Although this may seem obvious, I wanted to note it down as it’s something that can happen in new project environments.


Test is very important.

This year, I was responsible for testing and CI/CD tasks, and I deeply felt the difference in quality and the level of anxiety between having tests and not having them.

Having unit tests and integration tests provided significant peace of mind during deployments.

I found it incredibly appealing how tests not only improve quality but also reduce anxiety for everyone involved.

While I understand the argument that there’s no time to write tests because of a busy schedule, I still firmly believe that writing tests is essential. They are invaluable when it comes to future development.


Don’t trust people.

It might be misunderstood, but this is not about saying that people are untrustworthy. It’s about the fact that people can make mistakes, so we shouldn’t rely solely on them.

During a collaboration, there was a case where someone promised to handle tasks manually every day, but eventually issues arose, causing a chain reaction of stress. Recently, there have also been instances where data wasn’t provided or was given in a different format when it needed to be collected. In such cases, I believe it’s better to build a system that allows the team to directly access database.

In the end, everything should be automated. It’s better for everyone if we systematize tasks for mutual convenience. For example, when building a data pipeline, a single team should be able to monitor everything from direct access to the database to data loading, which would reduce the need for communication. Another example is when people manually enter passwords or information, errors can occur, so a system should be created to allow the person entering the data to perform the validation. Lastly, even if people perform all the tests, the final testing should be done by machines to ensure quality is reliably guaranteed.


Even when busy, you must do what’s necessary.

Some of this overlaps with what I mentioned earlier. This is also a matter of compromise, and it can vary depending on the situation, but for critical areas like testing, it’s essential to make time and make them.

During a migration, I once overlooked an option, which led to a situation where I had to handle large data volumes. I kept postponing it, and eventually, the data size grew too large, causing issues. If you have a strong suspicion that something will definitely happen or is something that needs to be done, even if you’re busy, it’s better to address it right away. This was something that also involved the service side, so it should’ve been handled in advance.

Again, I fully understand that, as mentioned, it’s not easy.


Context switching takes a lot of time.

The “context change” referred to here means a change in tasks. The work you’re assigned could change, or the tasks you need to complete may shift.

This year, frequent interruptions caused me to often switch tasks, making it hard to focus and difficult to get back into the flow of work when doing again.

In the first half of the year, we reorganized our tasks, and starting new work in the middle of a busy period took a lot of time to get up to speed. When tasks change, it seems to take a lot of time to learn everything again. So, if such changes are planned, it’s a good idea to leave enough buffer time.


The policies of high level organizations and the circumstances at the time.

When working, I often ask myself, “Why was it made like that?” This question doesn’t come from arrogance, but rather from the feeling that things were made in a somewhat roundabout way. This is similar to the fragmentation I mentioned earlier. When I talked to the people in charge at the time, I found that there were reasons for everything. Sometimes, the team lacked the necessary organization power, and decisions had to be made to accommodate political or power related issues. There were also situations where it was a matter of, “If you don’t do it this way, we won’t cooperate.” The requirements were constantly changing, there was no standardization, and there were many other reasons. In situations like this, external factors beyond development can influence the direction of the project, making things harder. While there may have been areas where engineers missed or couldn’t address something, external influences played a large role as well.

When my team shifted to a larger, enterprise wide organization, and a powerful leader within the organization publicly announced a change, it became something everyone had to follow. Everyone responded. Even with strong technical capabilities, without support from the organization at the enterprise level, it wouldn’t have moved forward. From this experience, I learned that in a strong organization, policies need to be created for progress, and that’s the right approach. It’s much more efficient for the CEO to give a directive for the whole company than for an executive at the director level to try to negotiate.


You need to maintain physical strength and stay mentally focused.

Physical stamina is incredibly important, and no matter how much it’s emphasized, it still can’t be stressed enough. You need stamina to stay focused. When your energy drops, your concentration decreases, and efficiency suffers. When you’re lacking stamina, your efficiency in any task drops.

In the past, even if issues arose, they weren’t a huge problem. But now there are many services and users which mistakes are unacceptable. So, I have to stay focused every time I work.

When issues arise and the impact is significant, you can’t afford to make mistakes, so it’s important to stay alert. And to maintain that focus, you need physical stamina.


Conclusion

All of this reflects what I’ve felt this year. I wanted to go into more detail and share more specific examples to gain deeper empathy, but since many of the stories are related to the company, I couldn’t elaborate as much as I would have liked. Even so, I believe there are many relatable examples in the title and the content. Of course, these situations can vary depending on the context, so it’s important to think flexibly and respond accordingly. This are not definite answers.

While my body and mind went through some tough times, I try to think positively because there are still several things I’ve learned.

도메인의 인증서가 서브도메인의 인증서로 적용되었던 사례.


환경

  • Let’s Encrypt
  • Nginx
  • Certbot


배경

  • 궁금해서 운영하던 도메인의 인증서를 확인해봤는데 도메인의 인증서가 서브도메인으로 되어있는걸 발견하여 수정했다. 예를 들어 도메인이 twpower.org라고 하면 브라우저에서 봤을 때 subdomain.twpower.org의 인증서로 되어 있었다.
  • 사용하는 서버의 경우 Nginx를 이용하는 환경이고 certbot을 통해서 인증서 설정을 했었다.


이유

  • Let’s Encrypt의 인증서를 사용했고, certbot을 통해 설정했었는데, 확인해보니 도메인별로 따로 인증서 설정을 해야 했었다. 그런데 하나의 명령어로 서브도메인까지 함께 적용했었다.
  • 링크를 찾아보면 와이드카드로 서브도메인까지 포함해 설정이 가능했다. 그런데 방법이 조금 복잡해보여 적용하지는 않았다.


수정 방법

  • 도메인별로 각각 따로 지정해줘야 한다고 한다. 예를 들어 twpower.org 그리고 서브도메인이 subdomain.twpower.org라면 아래처럼 따로 적용하면 된다.
  • 아니면 위 링크에 있는 방법으로 서브도메인까지 적용 가능하다. 하지만 이 글에서는 다루지 않는다.
sudo certbot --nginx -d twpower.org -d www.twpower.org
sudo certbot --nginx -d subdomain.twpower.org


조치 과정

기존에 certbot에 의해 설정되었던 부분들을 제거하고 다시 인증서를 설정했다.


certbot에 의해 작성된 기존 설정들 제거

/etc/nginx/sites-enabled/에 설정 파일들이 아래처럼 도메인명으로 만들어져 있다.

twpower@twpower-private-server:/etc/nginx/sites-enabled$ ls /etc/nginx/sites-enabled/
subdomain.twpower.org  twpower.org

certbot에 의해 설정된 부분들을 아래와 같이 주석이 달려있다. 아래는 일부를 발췌했다.

if ($host = www.twpower.org) {
    return 301 https://$host$request_uri;
} # managed by Certbot


if ($host = twpower.org) {
    return 301 https://$host$request_uri;
} # managed by Certbot

기존에 적용된 부분들을 없애기 위해 # managed by Certbot로 되어있는 부분들을 모두 삭제했다.

Nginx 영향 확인

sudo nginx -t

certbot을 통해 인증서 다시 설정

nginx 설정이 정상이라면 아래 명령어를 참고하여 도메인과 서브 도메인에 인증서를 적용한다.

sudo certbot --nginx -d twpower.org -d www.twpower.org
sudo certbot --nginx -d subdomain.twpower.org


결과

  • 각 도메인으로 들어가서 인증서를 확인해보면 서로 다른 인증서가 적용되어 있다.


의견

  • 요즘에는 클라우드 서비스를 이용하면 이런 작업은 다 해주는데 또 접할 기회가 있을지 모르겠다.
  • 클라우드 서비스에서 해줄 뿐 아니라 ChatGPT 같은 서비스에서 다 자세하게 알려주는 시대라 이걸 다시 접할 기회가 올지는 모르겠다.
  • 나처럼 작은 서버 운영할 때는 도움이 될 수 있겠다.


참고자료

Case of subdomain’s certificate was applied to root domain.


Environment and Prerequisite

  • Let’s Encrypt
  • Nginx
  • Certbot


Background

  • Due to my Curiosity, I searched my domain certificate and found that domain’s certificate was configured to subdomain’s certificate so I modified it. For example, if domain is twpower.org, the certificate was displayed as subdomain.twpower.org’s certificate in browser.
  • Running server is Nginx environment and set certificate by using certbot.


Reason

  • I used a Let’s Encrypt certificate and set it up using certbot. I found that I needed to set certificate per doamin. However I set up domain and subdomain simultaneously.
  • Link shows a way of applying wildcard which contain subdomain. However I did not follow that way because it looks not easy.


Fix Method

  • Certificate setting should be applied per domain. For example if there are domain twpower.org and subdomain subdomain.twpower.org then it can be applied like below command.
  • It is possible to apply to subdomain using above Link. However this post does not include that method.
sudo certbot --nginx -d twpower.org -d www.twpower.org
sudo certbot --nginx -d subdomain.twpower.org


Fix Process

Remove exist previous setting by certbot and set certificate again.


Remove existing configurations created by certbot

There are setting files per domain in /etc/nginx/sites-enabled/.

twpower@twpower-private-server:/etc/nginx/sites-enabled$ ls /etc/nginx/sites-enabled/
subdomain.twpower.org  twpower.org

The parts configured by certbot are commented as shown below.

if ($host = www.twpower.org) {
    return 301 https://$host$request_uri;
} # managed by Certbot


if ($host = twpower.org) {
    return 301 https://$host$request_uri;
} # managed by Certbot

To remove previous exist setting, I removed codes which commented with # managed by Certbot.

Check Nginx

sudo nginx -t

Set Certificate Using certbot

If nginx configuration is correct, refer to the command below to apply the certificate to both domain and subdomain.

sudo certbot --nginx -d twpower.org -d www.twpower.org
sudo certbot --nginx -d subdomain.twpower.org


Result

  • Each domain has each own certificate.


Opinion

  • These days, cloud services handle these tasks, so I may not encounter this situation again.
  • Not only do cloud services handle this, but we also live in an era where services like ChatGPT provide detailed instructions, so I might not have the chance to encounter this again.
  • When running a small server like me, this could be helpful.


Reference

workflow_dispatch와 workflow_call을 사용하지 않을 때 기본값 설정하기


환경

  • Github Actions


배경

  • Github Actions에서 workflow_dispatchworkflow_call에 정의된 inputs을 사용하는 부분이 있었는데 이 부분이 push와 같이 다른 이유로 호출될 때 사용하지 못하는 이슈가 발생하여 기본값을 설정해두면 대처가 가능할거 같아 찾아서 정리했다.


방법

  • push와 같이 workflow_dispatchworkflow_call이 아니라서 inputs을 사용할 수 없는 경우 기본값을 설정할 수 있다.
  • workflow_dispatch 또는 workflow_call을 통해 실행된 workflow의 경우 inputs에 명시된 값을 사용하고 그렇지 않다면 || 뒤에 나온 기본값을 사용하는 방식이다.
  • 기본값을 환경변수에 저장하고 그 환경변수를 가져오는 방식이다.

코드

name: Workflow Return Test
on:
  push:
    branches: [ main ]
  workflow_call:
    inputs:
      key:
        type: string
  workflow_dispatch:
    inputs:
      key:
        type: string
jobs:
  print_default_value_test:
    name: print value
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
      - name: Set default value
        run: |
          echo "VALUE_NAME=${{ inputs.key || "default value" }}" >> $GITHUB_ENV
      - name: Print value
        run: echo "${{ env.VALUE_NAME }}"

결과

  • push를 통해 실행된 경우
Run echo "default value"
  echo "default value"
  shell: /usr/bin/bash -e {0}
  env:
    VALUE_NAME: default value
default value
  • “test input value”라는 문자열 값을 전달해 workflow_dispatch 또는 workflow_call을 통해 실행된 경우
Run echo "test input value"
  echo "test input value"
  shell: /usr/bin/bash -e {0}
  env:
    VALUE_NAME: test input value
test input value


의견

이렇게 하는게 권장하는 방법인지는 모르겠으나 검색했더니 나온 결과가 있어서 사용했다.


참고자료

Setting default value when not using workflow_dispatch and workflow_call


Environment and Prerequisite

  • Github Actions


Background

  • In GitHub Actions, there was an issue where the inputs defined in workflow_dispatch and workflow_call could not be used when triggered by other events like push. To address this, it seems that setting default value could be a potential solution so I wrote this.


Solution

  • In cases when inputs cannot be used like push because it is not triggered by workflow_dispatch or workflow_call, you can set default values as shown below.
  • For workflows triggered by workflow_dispatch or workflow_call, the values specified in inputs are used. Otherwise the default value following || is used.
  • Save default value to environment variable and use it.

Code

name: Workflow Return Test
on:
  push:
    branches: [ main ]
  workflow_call:
    inputs:
      key:
        type: string
  workflow_dispatch:
    inputs:
      key:
        type: string
jobs:
  print_default_value_test:
    name: print value
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
      - name: Set default value
        run: |
          echo "VALUE_NAME=${{ inputs.key || "default value" }}" >> $GITHUB_ENV
      - name: Print value
        run: echo "${{ env.VALUE_NAME }}"

Result

  • Triggered by push
Run echo "default value"
  echo "default value"
  shell: /usr/bin/bash -e {0}
  env:
    VALUE_NAME: default value
default value
  • Triggered by workflow_dispatch or workflow_call with passing string value “test input value”
Run echo "test input value"
  echo "test input value"
  shell: /usr/bin/bash -e {0}
  env:
    VALUE_NAME: test input value
test input value


Opinion

I’m not sure if this is the recommended one but I used it because I found it by searching.


Reference