Chrome with selenium, python and docker

Sometimes, when I execute the Chrome inside a docker container I receive the error below:

selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: crashed

It is caused because Chrome uses the /dev/shm to share memory and the docker, by default, set 64MB for this partition.

Here, I will describe how to reproduce this error and how to resolve it.

How to reproduce

To reproduce the error, We need to create a docker image with chrome and python. Then, I create this Dockerfile. It uses a multi-stage build to install the chrome and the python.

After clone this repository, we create the image with the command:

docker build -t python_selenium_chrome .

And then, we can run one container with the command:

docker run --shm-size=1b -it python_selenium_chrome bash

Finally, we call the chrome from python with the command: python run_chrome.py

After you run it, you will receive one message error similar as we have shown before(in the first session).

How to resolve

To resolve this problem we only need to configure the correct size of the /dev/shm. We can run the container with the command:

docker run --shm-size=1g -it python_selenium_chrome bash

Now, we are running the container with 1 gigabyte of memory to /dev/shm. After this is possible to run the chrome normally inside the container.

References

P.S. English is not my first language and I create this blog to improve my write in English. If you find some errors or have some tips to improve my write and English, I will love to receive it!

Identify Paragraphs

I’m learning computer vision and I’m using the Opencv and Python to write algorithms in this field of computer science. This field is very interisting because it envolves computer science and math. It is necessary to have good ideas to solve the problems.

Here, I will show my first algorithm in computer vision. It is to identifies paragraphs in a text image. It consider that the image have a white background and that the characters have a black color.

This algorithm have two main ideas: First, it removes blank lines and columns before finding the text. After this, it calculate the number of blank lines between no blank lines. The minimum and the maximum number of blank lines is computed. Though that the number of blank lines between two no blank lines is greather or equal 90%(maximum - minimum), we consider that we found a new paragraph.

Below, you can see the image with text to input in the algorithm:

Text extract from python's doc

And in the image below is possible to see the image with the identified paragraphs:

Identified paragraphs

You can download the algorithm and get more details in the repository: https://github.com/joaojunior/identify_paragraphs_in_image

Resources in Operational Research

I write a list of operational research resource. In this list have many name of books, events, groups, solvers and teachers. I publish this list here: awesome-operational_research. It is a github’s repository and you can help to update :-)

I write based on the list, wrote by Avelino, about resources for go programming language.

You can also read this in Medium.

Callbacks in Cplex

I will try to use LazyConstraintCallback to implement a Branch and Cut algorithm with CPLEX, but when I implemented and run this algorithm, it is very slow if I compare with version that don’t use callbacks in Cplex. It is because ControlCallback(LazyConstraintCallback is a ControlCallback) switch Cplex to sequential mode by default and turn off dynamic search. I write this post, because I don’t find this information in Cplex’s documentation, I only found this information in this technical forum.

You can also read this in Medium.