On this post, I want to give a quick intro on how you can get familiar with python terminology and logic so you can jump into python developer discussions without being completely lost.
This blog post came from discussions with some people trying to get started with python and were wondering what are the knowledge to feel comfortable talking python.
There are some common topic with the article I wrote on “the other things you need to know about”, more specifically on the technology part so I won’t repeat some elements such as “API” or “Browser / Server”. I recommend to check the long article if you want to know more.
Python IDE
Starting discussing about python is to start discussion about your Integrated Development Environment (IDE) first. As one of the first things that comes to mind when starting to learn a programming language is “Where do I write that ?”
It is something that you tend to forget, because once you have done this setup you hardly changing it. But this is already something to learn when starting with Python. The type of IDE that is being used can already tells you a bit about the type of work this person does. I will give you the 3 most common I know but there are other IDE available and you may want to test them:
- PyCharm: It is the python IDE from JetBrains, which is a famous IDE company. Their IDE for Java (IntelliJ) is very famous and this IDE is really famous for python users that want to realize application development. This mostly oriented for pure developers.
- VS Code: It is the lightweight development IDE from Microsoft. It supports multiple language and as you want to change JS or python, it doesn’t really matter.
This is very good for the versatility, easy usage. This is also aims for the pure developers that wants to build application but it is easier to use. - Spyder: This application is aimed at the Data Scientists, or the developers that wants to create on file program. You want to realize a script for cleaning something, or analyze something else. This is very practical because it integrates variable explorer. You can see the variables that you have in your memory and their values. Super practical to debug and understand what your program is doing.
- Jupyter Notebook: This application is working inside your browser and gives you an interface where you can realize python commands step by step. This is very practical to share tutorials or processes. I think this is what makes python so popular to learn.
Most of the scientific work is nowadays share through that means and even R copy this system. But due to the propensity that R has to chain methods, it is less helpful.
From my experience, I am using mostly Jupyter, Spyder and VS Code.
Jupyter to create tutorial and share programs or methods that I want other to use. Spyder I have mostly learned through it because it is so beginner friendly and this is perfect to learn to do script for one task. VS Code when I got comfortable working with multiple python files and get to more sophisticated programs such as my API wrappers.
One important head jump you can do is to install Anaconda. Using this installation is providing you Jupyter, Spyder and lots of libraries for data science by default (we will see what libraries are below).
Basic Python
Obviously, if you want to start discussing about python, you need to know a bit of the history of the language and context around it. For that part, you can read the Wiki page and you will be more than good to go.
But once you get the context, there are very few things that you need to understand in order to get started. What I am going to explain is valid for any programming languages.
Here are the topics that I think you should know:
- Variable: Seems basic but it is important to understand how information are stored in python. Each variable possess a value and a memory reference to it. If you want to be a bit more advanced, the reference to that value could be the same for 2 variables with the same value.
- Variable Types: The different variable types are important to know. You don’t have to know all of the type but string, integer, float, list, tuple, dictionary, boolean are a must. Understanding what they are and how to see them from the first sight is mandatory because people will show you code and won’t explain the type of variable they are using, you’ll have to recognize it.
- Basic programming logic: On this part you should learn, the conditions (if, elif, else), and the loop logic (for, while). Understanding what a function (def) is is important because it is the base of any program. They are usually using functions.
Other information are always nice to know but I believe that they are not necessary to start a conversation on python. At the end, you will always find someone that knows more than you and it is not bad to ask questions.
Random keywords to tickle your interest: class, set, lambda functions.
The python ecosystem
When you talk about python, knowing some programming logic is important but there are some very specific libraries that comes very often that you should look it up and understand what they are used for. A library is a module that you can import and use its functions. It is very useful because it saves lots of time. Knowing the correct library to use is important because it is a matter of efficiency.
It is expected that everyone has favorite libraries because they usually work on specific problems, that are solved by using specific libraries. I will give the ones I am using the most and I feel are the most common to place in a discussion.
Pandas
I don’t think I do have to introduce Pandas much because every 2 articles about python, there is one mentioning this library. This library is one of the biggest reason why python is so famous. It makes working with data so easy that it is now my way “go-to” for almost any analysis or data manipulation.
Numpy
If you are found of Pandas and want to now a bit more how it works, you can have a look at numpy. Pandas is based partially on numpy for array manipulation. So you could optimize your code by doing your data manipulation only with numpy. It is a lower lever so a bit harder to use but super famous among data analysts or scientist.
Requests
If you work with API or want to access some webpages, using requests is the way to go. It is not in the standard library but I wouldn’t be surprise if it was at some point. This is so famous that there isn’t real alternatives to this library. It is so easy that you won’t have no problem doing the first HTTP request with it.
Matplotlib
Python allows you to draw your data quite easily with the matplotlib library. This is the base library for lots of visualization. You don’t have to know how to use it exactly because I would imagine that you will use a wrapper around it for most of the figure that you will do. However, as it serve for basis, knowing its name and how it works basically is important.
I would recommend seaborn for plotting static figure. It looks better by default and it is quite easy to realize advanced plotting.
Personally I became a big fan of Bokeh, this library is able to do absolutely everything. More importantly it could be served for dynamic dashboard. It is a bit harder to learn but worth the try if you want to realize complex viz.
os, pathlib, collections, json, re
There are lots of libraries that you should know because they are helpful utilities. I will name a few here and give a few description:
- os: This library helps to look into your computer system. Talking to your Operating System.
- pathlib: This library unifies the path of the files for Unix / Linux and Window pathing system. It is big help in order to create function that works for both system.
- collections: This default library possesses lots of utilities methods that you may want to use (defaultDict, deque,orderDict, Counter).
- json: This default library helps you to decode the JSON format into a dictionary.
- re: This is the regular expression library.
Data science libraries
Talking about python has high probability to lead you on Data Science discussions. Knowing the libraries there can be helpful to follow the discussion.
- Scikit-learn : This is the default data science library that everyone knows and everyone in data science has used. It possesses lots of very good algorithm implemented by default (Decision Trees, Regressions, SVM, etc…)
- TensorFlow : It is the famous library from Google, it copies part of the scikit-learn logic but also extends it with neuronal network. It works with tensor, which looks very similar to numpy arrays, but it is from Google.
- PyTorch : This library is to neuronal network library from Facebook. It is also quite good, very famous for image detection.
- Keras: I personally never used Keras but it is a library that comes on top of other libraries in order to not to have to deal with low level code. It is to TensorFlow, what pandas is to Numpy.
This will be it for the thing to know in order to get comfortable on your (first) conversations about python. If I miss anything, please let me know in the comments.