Get Safe Paths From Arbitrary Strings In Python

Sometimes, all you want to do with an arbitrary string, is to use it to create a file or a directory. Really, that’s all. Nothing too special about it, right?

Alas! This is the root of all evil!

Arbitrary strings are dangerous, and should be handled with the utmost care, as if they were explosives, or Frank Underwood’s new liver! (sorry)

Wait, but, how exactly are they to be handled? And why should you reimplement this apparently basic, but practically risky, functionality every time you need it?

This is exactly what my ostrich.utils.text.get_safe_path() OstrichLib function set out to solve once and for all ๐Ÿ™‚

It’s already available in Ostrich Lib in release v0.0. It’s also released to PyPI, meaning you can get it now with pip install ostrichlib. It’s tested (using Travis CI) against Python 2 & 3, and requires only the future library as an external dependency (which makes everyone happier with Python 2 / Python 3 compatibility). Detailed library documentation are available via Read the Docs. Hurray!

I would love to get some review from others for my solution, given the risky nature of the problem.

Continue Reading…

My Python 3 & Python 2 Dual Development Wishlist

I want to use Python 3 in my new Python-based projects. I really do.

It is clear that Python 3 is the future (present?), and is probably the “better language” compared to Python 2. Why not use the newer language?

Raymond Hettinger would probably say that Python 3 allows you to write more beautiful, idiomatic Python. I’m all in favor of that!

These days, I use Python 3 without much thought for small scripts that are intended for personal use, with few dependencies. This works fine, as long as I don’t reach a point that I want to use that library that doesn’t support Python 3, and then it’s a PITA…

If I want to write something that I want to share, on the other hand, choosing Python 3 is not so obvious. I want to use Python 3, because it’s the “Right Thing To Do”. I don’t want to be that “library that doesn’t support Python 3 and is someone’s PITA”. But I also want to have my stuff accessible to the masses that are still working only with Python 2, for whatever legitimate reason they may have. Like this, and to some extent this.

I wish there was an easy way to do just that – write beautiful, idiomatic, Python 3 code, while maintaining Python 2 compatibility.

I want to write beautiful, idiomatic, Python 3, and maintain Python 2 compatibility – it this too much to ask?

Yes, you can say that Six library is a solution for writing dual Python 2 & Python 3 code. But would you argue that it grants my wish for “beautiful, idiomatic, Python 3 code”? If you think it does – please show me some examples!

Maybe you can also claim that 3to2 is a viable solution. I don’t know – I haven’t tried yet with complex real-life use-cases. Have you? Can you say this is a feasible path to take? How well does it handle the newest shiniest Python 3 stuff? What’s the development workflow around it? What do you do to catch and fix the edge-cases it doesn’t handle well, without ruining the original code with version-aware crap?

Are there other viable options? Anyone doing this with a large-scale projects, and is willing to share the approach and experience?

Check Out These Python Podcasts

I recently discovered two new podcasts about Python!

I think both podcasts started about the same time – 3-4 months back. I found out about them a few weeks ago, and managed to catch up on the episode backlog ๐Ÿ™‚ .

Both are based on conversations with people from the industry and Python community.

Talk Python To Me is hosted by Michael Kennedy. At the least, check out the show theme music. Beyond that, I enjoyed the technical depth of the conversations. Especially the episodes about Netflix, Flask, and Docker. The production quality is surprisingly good for a young independent podcast.

Podcast.__init__ is hosted by Tobias Macey and Chris Patti. This show could use some nicer production-fu, as much as I enjoy a whispering guest along a shouting host. Many of the conversations on this show are interesting as well, like the one with the Python guys. Personally, I enjoyed the more technical conversations, compared to the culture-themed ones.

If you’re a Python-enthusiast, definitely look into these podcasts!

Using Python’s print With No Newline

Maybe I’m the only one who didn’t know that, but yesterday I learned how to make Python’s print function not write \n.

If you’re using Python’s print function to write stuff to standard output, you’re probably used to its default behavior of ending lines with \n:

> for i in xrange(5): print i

Sometimes you don’t want it to add the trailing newline. I used to think that in such cases, I need to switch to sys.stdout:

> import sys
> %paste
for i in xrange(5):
    sys.stdout.write('%d ' % (i))
    sys.stdout.write('\n')  # for the trailing newline after the last iteration

## -- End pasted text --
0 1 2 3 4

Turns out that the print function supports the same behavior with a trailing comma!

> for i in xrange(5): print i,

0 1 2 3 4

Of course, this is covered in the documentation.

Every day I learn something new…

Don’t Use Python Lists Where Generators Work Better!

By Sunday, February 15, 2015 1 Permalink 1

I recently learned about a subtle difference between lists and generators in Python.

The gist: when passing iterables, always prefer generators over lists!

Also: when returning iterables, always prefer generators over lists!

Most of the time, the intuitive way does the Right Thing. I noticed the subtlety in some cases where my intuition didn’t do the Right Thing.

In this post I explain the difference using several examples. The last example is an important one, demonstrating how fnmatch.filter() can misbehave!

Continue Reading…

Running a Time-Limited Subprocess In Python (concurrency caveats inside!)

By Tuesday, January 13, 2015 3 Permalink 1

I tried to write a “simple” Python function. All it had to do is to run a command line in a subprocess, and enforce a timeout on that subprocess.

Turns out that this “simple” goal is not so simple. I considered multiple approaches to solve this, and ran into interesting concurrency issues.

In this post I describe a solution based on Python subprocess and various threading constructs. I demonstrate a concurrency inconsistency I discovered when testing my solution on Linux and OS X.

The conclusion, if that’s all what you’re looking for, is that timer.is_alive() is not a safe way to test whether a timer had expired!

Note: My experience is based on Python 2.7 with the default subprocess module from the standard library. If you’re on Python 3.3+, a timeout argument was added to subprocess. You can also install python-subprocess32, which brings this joy to Python 2.4-2.7.

Continue Reading…

Quick and Dirty Personal Social Analytics With Google App Engine

I want to track basic metrics of my main social thingies – The Ostrich Facebook page, The Ostrich Twitter account, and The Ostrich Google+ page.

My short-term goal is to have simple graphs of some basic metrics over time:

  • Facebook page likes and shares, posts, post likes and shares.
  • Twitter followers (and following), tweets, favorites, retweets.
  • Google+ followers, views, +1’s, shares.

I’m not sure why I want this data, but if I do find a good reason in the future, I’d better start tracking it now!

In this post, I describe how I went from the idea to start collecting the data, to a deployed data collecting app using Google App Engine, in under 5 hours.

Continue Reading…

Manipulating Python os.walk Recursion

The os.walk function in Python is a powerful function. It generates the file names and sub-directory names in a directory tree by walking the tree. For each directory in the tree, it yields a 3-tuple (dirpath, dirnames, filenames).

It is not well-known that you can modify dirnames in the body of the os.walk() loop to manipulate the recursion!

I’ve seen programmers avoid using os.walk(), and hack their own version of it using recursive calls to os.listdir(), with various path manipulations in the process. It was rare that the programmer doing this was not familiar with os.walk(). More often than not, the reason was that the programmer wanted more control over the recursion. Unfortunately, if the programmer was aware that this can be done with os.walk(), she would probably use it and save time and sweat!

This specific feature is well documented in the Python os.walk docs. Seeing how under-used it is, I wanted to highlight it here, hoping it will serve someone out there ๐Ÿ™‚ .

Continue Reading…

Right-click Hashes and Pythons ASCII command-line

By Thursday, December 11, 2014 0 , Permalink 1

This post is a guest post by Gil Dollberg

A while ago I wrote a Python script that calculates MD5 and SHA1 hashes on a file with a right click. Hereโ€™s the script that calculates the MD5 and the script that writes the .reg file. What you probably want to download is just the reg file – double click, install, and youโ€™re set. Note the pythonw.exe caveat below though…

Continue Reading…