Joao Junior

Composition, Inheritance, and Liskov Substitution Principle

Posted on April 22, 2025 | 5 minutes | 865 words |

In software development, organizing our codebase in a way that is easy to maintain is always a good practice, and this idea is not new. Actually, I think this started in 1968 when the term software crisis was coined. Since then, we have developed many principles to help us achieve this goal. One person who has made many contributions to this field is Barbara Liskov. She is very famous for the Liskov Substitution Principle (LSP), but she also invented the Abstract Data Type(ADT), which is the core concept of classes and object-oriented programming. Here, I want to discuss composition compared to inheritance and how understanding the Liskov Substitution Principle can help you develop better software.

[Read More]

Improving legacy code: Using Task Queue to Speed Up a Crawler in an ETL Process

Posted on February 10, 2025 | 9 minutes | 1861 words |

In this post, I will present how I improved an ETL process to be more than 4x faster using the same resources. While the old architecture used Python threads, the new one used task queue architecture to be more reliable and scalable. So, I will explain how we can improve a legacy code and speed it up by only modifying the architecture to run the code.

Almost 10 years ago, I was hired to improve a system and train a team of engineers in Python and scalable systems. The company had an ETL process to provide financial information for the finance team. The ETL was composed of the crawler, the parser, and the normalizer. The crawler’s primary responsibility was crawling finance data on our partners’ websites, which were protected by a username and password. The parser converted CSV, HTML, and JSON data and saved the result in our intermediate database. The last step, the normalizer, summarized the data using some keys and sent it to the system where the finance team could access it.

[Read More]

ETL Crawler Improving legacy code Task queue Celery Architecture

Is CPython Faster Without the Global Interpreter Lock(GIL)?

Posted on July 17, 2024 | 7 minutes | 1326 words |

The Global interpreter lock(GIL) is a mechanism in CPython implementation to ensure only one thread is running at a time. This means that even if we use threads to try to speed up our programs’ runtime, this will not work because, in the end, the interpreter will block that two or more threads will be executed in parallel.

As the name suggests, the GIL is a locker that a thread gets when it starts and releases at some moment. There is only one locker, so two threads cannot get it simultaneously. As a result, our program with threads will be slower than the same program without threads because of the time to create threads and not take advantage of them. However, this is true only for CPU bound problems, which requires a large amount of CPU time to process, like inverse matrix. For I/O bound problems, CPython releases the locker as soon as an I/O call happens. We still have only one thread running at once on the CPU, but now, a thread does not block other threads while waiting for the I/O result.

[Read More]

Python Threads GIL NoGIL

Lua Scripts Are Not Atomic in Redis

Posted on April 8, 2024 | 3 minutes | 611 words |

Redis is an in-memory data structure store, very powerful, with an excellent abstraction via its API to manipulate its data structure. What makes Redis very powerful, in my opinion, is the variety of data structures it provides and the performance of its operations. This last feature is an effect of being an in-memory store.

As with every software, Redis also has technical decisions and limitations that we need to be aware of and consider when choosing it. In this post, I will not describe a technical limitation but a poor documentation description of one feature.

[Read More]

Redis Lua scripts Atomic operations