Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Modern Computer Science is obscurantism disguised as science

Recently I decided to get back to the basics and question my use of classes/frameworks.

I tried to find a definition of the word framework but weirdly this word is a tautology, no one knows what it is, but it refers at itself for something people use to get the work done through the use of external knowledge. But can you manipulate knowledge you do not understand?

This word has been broadening so much that it globally is a synonym of a shortcut for standing on the giant's shoulder without having to do the painful process of climbing painfully all the way up by learning. Framework are actually for me a synonym for an intellectual shortcut ; they are the foundation of cargo cult programming. And I want to illustrate it with a python code example to draw a wall clock without libraries except for importing constant (exponential and PI) and drawing.

So I will introduce an anti-framework that makes you have less dependency based on taking no shortcuts : science. Not the ones of computer science and university, the one that also is useful in the real world, hence a more useful tool than framework. Something that can help you compute solution with a pen and paper fast. And for which you can check the results of your code. The science taught in high school that requires no degrees and that we are supposed to know when we are given the responsibility to vote. The one that help you make enlightened decisions as a citizen for the future.

For the example I will use a simple use case with code: an analogic wall clock. I will treat different problems: 2D & 1D geometry, sexagesimal basis, drawing the hands, computing time without datetime.

Scientific reasoning to be understandable before coding requires that you explain what you are doing and how. To introduce the problem in a simple, yet consistent way so people do not understand the code, but the mental process to interpret the code, thus to define rigorously your concepts and definitions. The stuff Agile hates : explicit requirements and clear definitions of concepts, rigor.
 
What is time?

Time in its usual form hour/minute/second is inherited from Sumerian civilization. A civilization that was localted in Irak -whose archeological wonders occidental civilization took part in destroying- that has brought mathematical knowledge to the world thousands years ago. It was designed by people having no computers but knowledge of the fact earth is round, having simple use of fractions and basic geometry. Time cannot be manipulated without understanding that earth is round and that it revolves around the sun and that noon is dependent from your location.  12:00am is set on when the sun is at its zenith for a given place. Which is basically an invariant on a meridien. Giving time, is giving a relative position of earth according to the maximum potential exposition to the sun.

The base of 60/60 is useful for astronomers using sticks/shadows and is quite powerful in low tech context. 360° (minutes * seconds) is a measure of the rotation of the earth according to a reference (noon) and since the angular speed of earth is constant, it is linearly related to time. Time measure the relative angle from your position according to noon in a referential where earth is rotating around its north/south pole axis. It is periodic. It is thus a measure of phase/space. Usual time is a space measure.

The python time() function is an expression in a base 10 of the time elapsed since 1/1/1970 according to your geographical position without all political biases (except TZ).
Localtime is the correction with leap seconds, DST .... and politics. Something defined by arbitrary rules that is far to be a one best canonical way.

Each hand on a wall clock rotates with a speed according to its rank in the base.

Seconds rotates at the speed of a 60th of a turn per seconds.
Minutes are 60 time slower
Hours ... are 24 times slower, but by convention, we prefer to make it with a period of half.
Basis conversion be it base 10, 2, or sexagesimal is a CORE concept of computer science. It is a core requirement every developers should be able to do it without libraries.

I am gonna introduce a convenient tool for doing geometry : the Moivre formula that will do the heavy lifting :

exp( i * theta) = cos(theta) + i * sin(theta)
https://en.wikipedia.org/wiki/Complex_number

To understand the following code, understanding the geometrical relationship between complex notation, cartesian/polar coordinates requires learning and rigor. And there is not shortcut for it.

I don't know why, python is confusing j and i. And I hate IT, it is like a slap to the face of people using science.
i is defined by i² = -1 python decided to call it j
j is traditionally defined by a number such as  j**3 = -1 for which the imaginary part is positive. Thank you python for not respecting mathematical conventions, it makes this confusing.

I guess it falls into the tao of python
There should be one– and preferably only one –obvious way to do it.
Although that way may not be obvious at first unless you’re Dutch.



That is not the first time I have a beef with python community when it comes to respecting mathematical notations and their consistent use. I could dig this topic further, but it would be unfair to python which is not even the community with the worst practices. 

So without further ado here is the commented code without the noise of colors. Brutal code.
(Colored version here)


import matplotlib.pyplot as plt
from time import sleep, time, localtime

# Constant are CAPitalized in python by conventionfrom cmath import  pi as PI, e as E
# correcting python notations j => I  
I = complex("j")

# maplotlib does not plot lines using the classical
# (x0,y0), (x1,y1) convention
# but prefers (x0,x1) (y0,y1)
to_xx_yy = lambda c1,c2 : [(c1.real, c2.real), (c1.imag, c2.imag)] 

# black magic
plt.ion()
plt.show()

# fixing the weired / behaviour in python 2 by forcing cast in float

# 2 * PI = one full turn in radians (SI) second makes a
# 60th of a turn per seconds
# an arc is a fraction of turn
rad_per_sec = 2.0 * PI /60.0
# 60 times slower
rad_per_min = rad_per_sec / 60
# wall clock are not on 24 based because human tends to
# know if noon is passed
rad_per_hour = rad_per_min / 12

# I == rectangular coordonate (0,1) in complex notation
origin_vector_hand = I

size_of_sec_hand = .9
size_of_min_hand = .8
size_of_hour_hand = .6

# Euler's Formula is used to compute the rotation
# using units in names to check unit consistency
# rotation is clockwise (hence the minus)
# Euler formular requires a measure of angle (rad)
rot_sec = lambda sec : E ** (-I * sec * rad_per_sec )
rot_min = lambda min : E ** (-I *  min * rad_per_min )
rot_hour = lambda hour : E ** (-I * hour * rad_per_hour )

# drawing the ticks and making them different every
# division of 5
for n in range(60):
    plt.plot(
        *to_xx_yy(
            origin_vector_hand * rot_sec(n),
            .95 * I * rot_sec(n)
        )+[n% 5 and 'b-' or 'k-'],
        lw= n% 5 and 1 or 2
    )
    plt.draw()
# computing the offset between the EPOCH and the local political convention of time
diff_offset_in_sec = (time() % (24*3600)) - localtime()[3]*3600 -localtime()[4] * 60.0 - localtime()[5]   
n=0

while True:
    n+=1
    t = time()
    # sexagesimal base conversion
    s= t%60
    m = m_in_sec = t%(60 * 60)
    h = h_in_sec = (t- diff_offset_in_sec)%(24*60*60)
    # applying a rotation AND and homothetia for the vectors expressent as (complex1, ccomplex2)
    # using the * operator of complex algebrae to do the job
    l = plt.plot( *to_xx_yy(
            -.1 * origin_vector_hand * rot_sec(s),
            size_of_sec_hand * origin_vector_hand * rot_sec(s)) + ['g']  )
    j = plt.plot( *to_xx_yy(0, size_of_min_hand * origin_vector_hand * rot_min( m )) + ['y-'] , lw= 3)
    k = plt.plot( *to_xx_yy(0, size_of_hour_hand * origin_vector_hand * rot_hour(h)) +[ 'r-'] , lw= 4)
    plt.pause(.1)
    ## black magic : remove elements on the canvas.
    l.pop().remove()
    j.pop().remove()
    k.pop().remove()
    if not n % 1000:
        ### conversion in sexagesimal base
        print int(h/60.0/60.0),
        print int(m/60.0),
        print int(s)
    if n == 100:
        n=0
   


My conclusion is frameworks, libraries make you dumb. It favors monkeys looking savant as much as pedantism in academic teaching is. People may try to point THIS is pedantic, but pedantism is about caring about the words and formalism, not the ideas and concept. It is like focusing on PEP8 instead of the correction of the code. Pedantism is not saying correction is important, it is annoying developers with PEP8.

My code is saying the earth is round, that it revolves around the sun with a constant rotational speed, that noon is when the sun is at its zenith and happens periodically, that I have an harmonic oscillator in my computer that is calibrated to deliver me time with a monotonic growing functions, that we use a 60/60 base since millennials to represent time, that most of the problem we encounter with times are either political or due to an insufficient understanding of its nature. And that we can use complex numbers to do powerful 2D geometry operation in a compact, yet exact way that does not require libraries or framework. Complex numbers operations USED to be hardwired in CPU. They became useless, because people stopped using them by ignorance, not because they stopped being useful.

Our actual problem is not computer raw power, but education. Every coders using datetime modules should be sacked : datetime operations are (out of the TZ insanity) basic base conversion and 1D operations of translations projections. If a coder do not understand what numbers are, what time is, the difference between representations and concepts why do you entrust them manipulating your data in the first place? What do you expect?

That a thousands monkey will write you the next Shakespeare novel if you throw enough bananas at the monkeys?

We live in a time of obscurantists people using advanced concepts that looks like science, but are not.



I still find with the examples used for teaching OOP counterproductive

After the stupid example with the taxonomy of the species/car/employees that makes you do a lot of inheritance one of the valid use case for OOP is 2D geometry : points & rectangles.

This is the python example in wiki.python.org
https://wiki.python.org/moin/PointsAndRectangles

Don't worry, python is one of the many language doing this class. It is an actual use case for learning classes.

200 lines of code and you do nothing. Not even fun

Except in python complex is a base type, and I learned the use of complex numbers when I was  in public high school in my sweet banlieue. (a place said to be full of delinquents and uneducated kids)

Then I thought : with just simple math how hard would it be to just draw two polygons one of which rotated with the python base types? 
How hard is programming when you keep it simple?

Here is my answer in less than 10% of the initial number of lines of code and a drawing to illustrate how hard it is :


from PIL import Image, ImageDraw
# short constant names a la fortran
from cmath import pi as PI, e as E
I = complex("j")
 
to_x_y = lambda cpl: (cpl.real, cpl.imag)
im = Image.new("RGB", (512, 512), "white")
draw = ImageDraw.Draw(im)
rotation = E**(I*PI/3)
homothetia = min(im.size[0], im.size[1])
trans = homothetia/2
homothetia /= 2.5
polygone = (complex(0,0), complex(0,1), complex(1,1), complex(1,0), complex(0,0))
bigger = map(lambda x: x * homothetia + trans, polygone)
bigger_rotated = map(lambda x: x * rotation, bigger)
draw.line(map(to_x_y, bigger), "blue")
draw.line(map(to_x_y, bigger_rotated), "green")

im.save("this.png", "PNG")

I don't say classes are useless, I use them. But, I still have a hard time thinking of a simple  pedagogic example of their use. However, I see a lot of overuse of Object Oriented Programming to try to make people able to use concepts that are not accessible without basic knowledge. OOP is no shortcut for avoiding to learn math, or anything else. If you don't understand geometry and math, you will probably be whatever the quality of the class given unable to do any proper geometrical operations.

We need OOP. But the basic complex type far outweigh in usefulness any 2D geometry classes I see so far without requiring any dependencies.  And I never saw in 5 years complex types used even for doing 2D geometry, however it seems the alternative buzzword to OOP -including numpy- is so making you look data scientist and pro! 

So what is the excuse again for not using something that works and is included in base types with no requirements for doing geometry when we say that games could be a nice way to bring teenagers to coding? Are we also gonna tell them learning is useless and that reinventing the square wheel is a good idea? Are we gonna tell them to not learn because we have libraries for everything?

For the fun I did an OOP facade to complex as point/rectangle but seriously, it is just pedantic and useless. But still 50% less lines and more useful than the initial example.




Querying complex unknown database with python

In my job, I sometimes have to deal with web applications that are pointing to half a dozen databases with an overall of more than 160 tables.

Not to say it can be complex, but sometimes it is :)

So I made a small script using 2 of my favourite technologies to deal with that: graphviz, and sqlsoup.

Sqlsoup is a lesser known child of the great software that Mike Bayer (zzeek) made: sqlalchemy.

I have been told that when you know an ORM you don't need to know SQL. I strongly disagree (especially when it comes to performance issues with wrong types of indexes or lack of). However, when you use the declarative sqlalchemy syntax, sqlalchemy does a lot of thinks rights that may helps a lot: it creates foreign keys when you use the backrefs (unless you use MyISAM). And sometimes you have a de-synchronization in your model (ho! someone made an alter table with SQL), you need to be able to do stuff on the data and your model does not help.

And these foreign keys helps a lot to construct an entity diagram relationship of the python objects that may be used.... to navigate easily in a flow of data. Because, the model may be a nice map. But sometimes you may need to rebuild it when the map is not the territory anymore. And time is pressing.


gist here https://gist.github.com/jul/e255d76590930545d383


Here, as a test case I used turbogears quickstart that I consider an under-rated framework. It is very well written and correct, and it has a lot of production features I like: possibility to easily change the technologies you don't like in the stack (mako or genshi, it is your choice), database versioning ...

The only stuff a quickstart builds is the data for authorization/authentication.

Here is the result of this script (of course, installing graphviz is mandatory to transform the out.dot file to a picture, and as you see the window's version has some glitches for the fonts)

turbogears quickstart database construction

and what is nice is that with using the interactive option you can directly go in the database with your diagram under your nose and directly get your data:

ipython -i generate_diagram.py postgresql://tg@localhost/tg
 

#postgresql://tg@localhost/tg problem with u'migrate_version'
#(...)
#SQLSoupError("table 'migrate_version' does not have a primary key defined",) 
#nb col = 17 
#nb fk = 4 
In [1]: db.tg_permission.join(db.tg_group_permission).join(db.tg_group).join(db. tg_user_group).join(db.tg_user).filter(db.tg_user.user_name == 'editor').all() Out[1]:[ MappedTg_permission(permission_id=2,permission_name=u'read',description=u'editor can read'), MappedTg_permission(permission_id=3,permission_name=u'see stats',description=u' can see stats') ]

I thought of making a module. But sometimes in the turmoil of production you don't have the time for it. A single short script is sometimes what you need with very few requirements that can be tweaked fast to adapt.

Note: Alembic (still from Mike Bayer) -that is nicely included in turbogears- can of course generate sqlalchemy models from a db and even make a diff between a database and an sqlalchemy declarative model. I don't know this guy but I must admit that even if sqlalchemy is a tad heavy when used correctly his softwares are great. And even if it is heavy pip install works fairly well in regard of the stability and complexity of the API.

Mister Mike Bayer -if you read me- I am a fanboy of yours and I like how you answer to people on the mailing list.

PS: yes I know about the graphviz module. But I use graphviz so often, it is faster for me to just write directly in a file.

Using python to visualize randomness

Heard of randomness?

What is random: something that given a serie of chronological events (happening after one another) whatever long the time serie is, we cannot predict the future.

Two kinds of events are random; purely random events (stochastics) or mathematically deterministic equations that are so sensitive to initial conditions that they are very hard to predict.

PRNG is the second kind of beast.

Takens series are an intuitive way of trying to see if something is random in a space phase. Here because our brain is limited I took only 3 dimensions. Anyway maplotlib sux at doing projections from n dimensions to 3 in a way that is easy to see.

You take segment of time series and you make linear visualization of the serie in function of its former self. And for the fun I added an alpha (the more you are in the past, the fader the color is), and I added as quivers the variation of variations
Script here : https://gist.github.com/jul/0f16782ed01f18c2c72a

Here are the results :
First order Takens for GE opening prices are swirling in the negative sense

Nice vortexes :)
2000 Random int between 0,32. The fadest, the oldest data.
Another point of view of on randint


Takens series were once used in an article to make an analysis of the randomness of TCP/IP and the impact of PRNG lack of randomness.

This article is a good introduction to how to use this tool on sampling randomness : http://lcamtuf.coredump.cx/oldtcp/tcpseq.html

I also got confirmation some regulations office are using Takens series to see if people are cheating online :)


Complex systems are both resistant to perturbations and unpredictable. A lot of phenomenons are like these: simple in their understanding but made of a lot of elements interacting: weather, the movements of the stars, your heart .... our IT infrastructures.

Takens series can be tweaked in more than one ways :)

Addendum: I used the periodic boundary conditions as a former physicist, because it does not essentially alter the result, and it makes code more readable even if incorrect for 4 points.
Why using 5 offsets? Every time I make a derivation I consume one degree of freedom.
so 3 dimensions + 2 derivations = 5 offsets. As a result it means by playing funnily I can devise as much space phases I have elements in the time series per dimensions minus the order of derivative I want to display.

PS : I forgot to seed the PRNG so that we can "see" the same things. Random data being random, without  seeding then results will differ normally according to the seed.

So I wrote a Proof of Concept language to address the problem of safe eval

I told fellow coders: «hey! I know a solution to the safe eval problem: it is right under my eyes». I think I can code it in less than 24 hours from scratch. It will support safe templating... Because That's the primary purpose for it.


TL; DR:


I was told my solution was overengineering because writing a language is so much efforts. Actually it took me less time to write a language without any theorical knowledge than the time I have been loosing in my various jobs every single time to deal with unsafe eval.

Here is the result in python : a forth based templating language that does actually covers 90% of the real used case I have experienced that is a fair balance between time to code and real features people uses.


You don't actually need that much features.

https://github.com/jul/confined (+pypi package)

NB Work in progress

 

How I was tortured as a student


When I was a student, I was nicely helped through the hell of my chaotic studies by people in a university called ENS.

In exchange of their help I had to code for data measurement/labs with various language OS, and environment.

I was tortured because I liked programming and I did not have the right to do OOP, malloc, use new language .... Perl, python, new version of C standards...

Even for handling numbers scientifics were despising perl/python because of their inaptitude to safely handle maths. I had to use the «numerical recipies» and/or fortran. (I checked in 2005 they tried and were disappointed by python, I guess since then they might use numpy  that is basically binding on safe ports of numerical recipies in fortran). I was working on chaotic system that are really sensitive to initial conditions ... a small error in the input propagate fast.

The people were saying: we need this code to work and we need to be able to reuse it, and we need our output to be reproducible and verifiable : KISS. Keep It Simple Stupid. And even more stupid.

So I was barred from any unbound resource behaviour, unsafe behaviour with base types.

Actually by curiosity I recompiled code that was using C and piping output to tcl/tk I made at this time to make graphical representation of multi agent simulations and it still works... It was written in 1996.

That's how I learnt programming : by doing the worst possible unfunky programming ever.  I thought they were just stupid grumpy old men.

And I also had to use scientific equipment/softwares. They oddly enough all used forth RPN notations to enable users some basic manipulation.

Like:
  1. ASYST
  2. RRD Tools
  3. pytables NUMEPXR extension http://code.google.com/p/numexpr
And I realized I understood:

FORTH are easy to implement:
  • it is a simple left to right parsing technique: no backtracking/no states;
  • the grammar is easy to write; 
  • the memory model makes it easy to confine in boundaries;
  • it is immutable in its serialization (you can drop exec and data stack and safely resume/start/transport them)
  • it is thus efficient for parallization,
  • it thus can be used in embedded stuff (like measurement instruments that needs to be autonomous AND programmable)
 So I decide to give me one day to code in python a safe confined interpreter.

I was told it was complex to write a language especially when like I do, I never had any lessons/interests in parsing/language theory and I suck at mathematics.


Design choices


Having the minimum dependency requirements: stdlib.

 One number to rule them all 

I have been beaten so much time in web development by the floating point number especially for monetary values that I wanted a number that could do fixed point calculus. And also I have been beaten so many time by problems were the input were sensitive to initial conditions I wanted a number that would be better than IEEE 754 to potentially control errors.
So I went for the stdlib IEEE 854 officious standard based number : https://docs.python.org/2/library/decimal.html
Other advantages: string representation (IEEE 754) is canonical and the regexp is well known. Thus easy to parse.

In face of ambiguity refuse to guess

I will try to see input as (char *) and have the decoding being explicit.
Rationale: if you work with SIP (I do) headers are latin1 and if you work in an international environment you may have to face data incorrectly encoded that can also represent UTF8 and people in this place (Québec love to use accents éverywhere). So I want to use it myself.

It is also the reason I used my check_arg library to enforce type checking of my operators and document stuff by using a KISS approach: function names should be explicit and their args should tell you everything.

Having a modular grammar so that operators/base types can be added/removed easily. 


I evoked in a precedent post how we cannot do safe eval in python because keywords and cannot be controled. So I decided to have a dynamic grammar built at tokenization time (the code has the possibility to do it, it is not yet available through the API).

Avoid nested data structures recursive calls


I wanted to do a language my fellow mentors could use safely. I may implement recursive eval in the future but I will enforce a very limited level of recursion. But, I see a solution to replace nested calls by using the stack.

Stateless and immutables only


I have seen so many times people pickling function that I decided to have something more usable for remote execution. I also wanted my code to be idempotent. If parsing is seen as a function I wanted to guaranty that

parsing(Input, Environment) => output 

would be guaranteed to be always the same
We can also serialize the exec stack the data stack at any given moment to change it later. I want no side effects. As a result there will ne no time related functions.

As a result you can safely execute remote code.

Resource use should be controlled


Stack size, size of the input, recursion level, the initial state of the interpreter (default encoding, precision, number behaviours). I want to control everything (that what context will be for and all parameters WILL have to be mandatory). So that I can guaranty the most I can (I was thinking of writing C extensions to ensure we DONT use atof/atoi but strtol/f ...).

This way I can avoid to use an awful lot of virtual machines/docker/jails whatever.

Grammar should be easy to read


Since I don't know how to parse, but I love damian conway, I looked at Regexp::Grammar and I said: Oh! I want something like this.

There are numerous resource on stackoverflow on  how to parse exactly various base types (floats, strings). How to alternate and patterns... So that it took me 3 hours to imagine a way to do it. So I still know nothing of parsing and stuff, but I knew I would have a result.

I chose a grammar that can be written in a way to avoid backtracking (left to right helped a lot) to avoid the regexp to be uncontrolled.

I am not sure of what it does, but I am pretty sure it can be ported in C or whatever that guarantees NO nested/recursive use of resources. (regexp are not supposed to stay in a hardened version this is just a good enough parser written in 3 hours with my insufficient knowledge).

I still think Perl is right


We should do our unittest before our install. So my module refuse to install if the single actual test I put (as a POC) does not pass.


Conclusion


So it really worths the time spent. And now I may be in the «cour des grands» of the coders that implemented their own language, from scratch and without any prior theorical knowledge of how to write one. So I have been geeking alone in front of my computer and my wife is pissed at me for not enoying the day and behaving like an autist, but I made something good enough for my own use case.

And requirements with python and making tests before install is hellish.

(Arg ... And why my doc does not show up on pypi? )

Eval is even more really dangerous than you think


Preamble, I know about this excellent article:
http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html

I have a bigger objection than ned to use eval; python has potentially unsafe base types.

I had this discussion with a guy at pycon about being able to safely process templates and do simple user defined formating operations without rolling your own home made language with data coming from user input interpolated by python. Using python for only the basic operations.

And my friend told me interpolating some data from python with all builtins and globals removed could be faster. After all letting your customer specify "%12.2f" in his customs preference for items price can't do any harm. He even said: nothing wrong can happen: I even reduce the possibility with a regexp validation. And they don't have the size to put ned's trick in 32 characters, how much harm can you do?

His regexp was complex, and I told him can I try something?

and I wrote "%2000.2000f" % 0.0 then '*' * 20 and 2**2**2**2**2

all of them validated.

Nothing wrong. Isn't it?

My point is even if we patched python eval function and or managed sandboxing in python, python is inherently unsafe as ruby and php (and perl) in the base type.

And since we can't change the behaviour of base type we should never let people use a python interpreter even reduced as a calculator or a templating language with uncontrolled user inputs.

Base types and keywords cannot be removed from any interpreters.

And take the string defined as:

"*" * much

this will multiply the string by much octets and thus allocate the memory ... (also in perl, php, ruby, bash, python, vimscripts, elispc)
And it cant be removed from the language, keywords * and base types are being part of the core of the language. If you change them, you have another language.

"%2000000.2000000f" % 0.0 is funny to execute, it is CPU hungry.

We may change it. But I guess that a lot of application out there depend on python/perl/PHP ruby NOT throwing an exception when you do "%x.yf" with x+y bigger than the possible size of the number. And where would set the limit ?

Using any modern scripting language as a calculator is like being a C coders still not understanding why using printf/scanf/memcpy deserve the direct elimination of the C dev pool.

Take the int... when we overflow, python dynamically allocate a bigger number. And since exponentiation operator has the opposite priority as in math, it grows even faster, allocating huge memory in a matter of small iterations. (ruby does too, Perl requires the Math::BigInt to have this behaviour)

It is not python is a bad language. He is an excellent one, because of «these flaws». C knight coders like to bash python for this kind of behaviour because of this uncontroled use of resources. Yes, but in return we avoid the hell of malloc and have far less buffer overflow. Bugs that costs resources too. And don't avoid this:

#include <"stdio.h">

void main(void){
    printf("%100000.200f", 0.0);
}

And ok, javascript does not have the "%what.milles" bug (nicely done js), but he has probably other ones.


So, the question is how to be safe?

As long as we don't have powerful interpreter like python and others with resource control, we have to resort to other languages.


I may have an answer : use Lua.

https://pypi.python.org/pypi/lupa

I checked  most of this explosive base type behaviour don't happen.

But, please, never use ruby, php, perl, bash, vim, elispc, ksh, csh, python has a reduced interpreter for doing basic scripting operation or templating with uncontrolled user input (I mean human controlled by someone that knows coding). Even for a calculator it is dangerous.

What makes python a good language makes him also a dangerous language. I like it for the same reasons I fear to let user inputs be interpreted by it.

EDIT: format http://pyformat.info/ is definitely a good idea.
EDIT++: http://beauty-of-imagination.blogspot.ca/2015/04/so-i-wrote-proof-of-concept-language-to.html

An opinionated versioning system based on mapping versions string to numbers in weird base

While we have a convention in python for numbering: 
http://legacy.python.org/dev/peps/pep-0440/

We can mostly say that version numbering thanks to "Windows 9" has shed an interesting spotlight on version comparaison.

They are to tenants of version handling:
- the naïves who consider versions has strings;
- the picky people who consider version has a very dark grammar that requires to be parsed with an ABNF compliant parser.

Well, of course, I don't agree with anyone :) Versions are just growing monotonic numbers written in a weird base but they must have at least comparaison operator: equal, superior/inferior, is_in.

Naïves people are wrong of course

 

It gives the famous reasoning why windows might jump to windows 10:
But is it better than:
https://github.com/goozbach/ansible-playbook-bestpractice/blob/915fce52aa82034cfd61cfbfefad9cf40b1e4f48/global_vars.yml

In this ansible playbook they might have a bug when centOS 50 will come out.

So, this does not seems to hit only the «clueless» proprietary coders :)


Picky peoples are much to right for my brain


Yes, python devs are right we need a grammar, but we don't all do python.

Given Perl, freebsd, windows ... our softwares need versions not only for interaction with modules/libraries within the natural ecosystem (for instance pip) but it should also nicely fit in upper container version conventions (OS, containers, other languages convention when you bind on foreign language libraries ...). Version numbering needs a standard. And semantic versionning propose a grammar but no parsers. So here I am to help the world.

The problem is we cannot remember one grammar per language/OS/ecosystem, espcially if they are conflicting.

PEP 440 with the post/pre weird special case does not look like very inspired by the tao of python (at my wrongful opinion of someone who did not took the time to read all the distutils mailing list because he was too busy fighting against a lot of software bugs at his job, and doing nothing at home).

So as when there are already a lot of standards you don't understand or cant choose from ... I made mine \o/

Back to basics: versions are monotonic growing numbers that don't support + - / * just comparisons

 

Version is a monotonic growing number.

Basically if I publish a new version it should always be seen superior to the previous one. Which is basically a number property.

In fact version can almost be seen as a 3 (or n) digit number in a special numbering such as

version_number = sum(map(project_number_in_finite_base(("X.Y.Z").split(".")))

The problem is if we reason in fixed numbered based logic, we have an intel memory addressing problem since every X, Y, Z number can cover an infinite range of values they can be a loss of monotonic growth (there can be confusion in ordering).

So we can abstract version number as digit in infinite bases that are directly comparable

I am luckily using a subset of PEP440 for my numbering that is the following http://vectordict.readthedocs.org/en/latest/roadmap.html

By defining
X = API  > Y = improvment > Z = bugfix

I state for a user that: given a choice of my software I guarantee your versions number to be growing monotonically on X / Y / Z axis in fashion such has you can focus on API compatibility, implementation (if API stay the same but code change without bug, it is a change of implementation), correctness.

As some devs, I also use informally "2a" like in 1.1.2a to notice a short term bugfix that does not satisfy me (I thus strongly encourage people to switch from 1.1.2x to 1.1.3 as soon as it comes. I normally keep the «letter thing» in the last number

If people are fine with API 1 implementation 1 they should be easily able to pinpoint versions to grow to the next release without pain.

So how do we compare numbers in a n infinite dimensional basis in python ?

Well, we have tuples \o/

Thanks to the comparison arithmetic of tuple  they can be considered to be a number when it comes to "==" , ">" and these are the 2 needed only basic operations we should need to do on versions (all other operation can be derived from the latter).

Version is a monotonically growing number, but it is on a non fixed base.

Next_version != last_version + 1

if version is a number V comparaison of V1 and V2 has sense, addition or substraction cannot have.

One of the caveat though of version numbering is our confusing jargon:
if we decided version where X.Y.Z why do we expect version 2 is equivalent to 2.0.0 instead of 0.0.2?  Because when we say python version 2 we expect people to hear python version 2.x  and preferably the latest. Same for linux 2 (covering 2.x.y ...) it is like writing the number «20» «2» and expecting people to correct according to the context.

So the exercise of version comparaison is having a convention to know how to compare numbers according to API, implementation and bugfix dimensions hierarchically speaking in respect to the undetermination introduced by human inconsistent notation.


Just for fun, I made a parser of my own version string to a numbering convention including the later twist where 2 means 2.0 or 2.0.0 when compared to 1.0 or 1.0.0. It addresses the examples to solve given in PEP440


It can be seen here.


Wrapping up


For me a version is an abstract representation of a number in infinite base which figures are hierarchically separated by points that you can read from left to right.
I am saying the figures are made of a tuple of two dimensional space of digit and letters where digit matters more than letters. (Yes, I am putting a number in a figure, it is sooo fractal).

But most important of all, I think versioning string is a representation of a monotonic growing number.

I am pretty sure PEP 440 is way better than my definition is has been crafted by a consensus of people I deeply respect.

My problem, is that I need to achieve the same goal as them, with less energy they have on modeling what a version number is.

That is the reason why I crafted my own deterministic version numbering that I believe to be a subset of PEP 440.

Conclusion

 

My semantic might be wrong, but at least I have a KISS versioning system that works as announced and is easily portable and for wich I have a simple grammar that does quite a few tricks and an intuitive comprehension.

And human beings are wrong too (why version 2 is 2.0.0 if compared to 2.1.1 and 2 if compared to 2.1 or 2 if compared to 3), but who cares? I can simply cope with.

NB it works with "YYYY.MM.AA.number" (SOA) scheme too,

PS thinking of adding y-rcx stuff by slightly enhancing the definition of a figure.

PPS I don't like talking to people normally so I disabled comments, but for this one I am making an effort : http://www.reddit.com/r/programming/comments/2iejnz/an_opinionated_versioning_scheme_based_on_mapping/
because I am quite curious of your opinions

Perfect unusable code: or how to modelize code and distributivity

So let's speak of what and un/deterministic code really are.

I am gonna prove that you can achieve nearly chaotic series of states with deterministic code \o/

Definitions:

Deterministic: code is deterministic if the same input always yield the same output

Chaotic: a time serie of value is considered chaotic if knowing of sample of t-n samples cannot make you able to predict the t+1 term. 

Turing Machine: a computer that does not worth more than a cassette player.

Complex system: a set of simple deterministic object connected together that can result in non deterministic behavior.

lambda function: stateless functions without internal states.

FSM (finite state machine): a stuff necessary in electronic because time is relativistic (Einstein).

Mapping: a mathematical operation/computer stuff that describes a projection of input discrete dimension A to output discrete dimension B. 


Now let's play real life turing machine.

Imagine I give you an old K7 player with a band of 30 minutes and every minutes I tell the result of n x 3.
If you go at minutes 3 the K7 will tell 9.
If you go at minutes 5 you will hear 15. 

This is the most stupid computer you can have. 
My tape is a program. The index (minutes) is the input, and the output is the what is said. 

So let's do a python Basically we did a mapping from the index on the tape in minutes to integers that yields index(in minutes) x 3. 



So what do we learn with this?

That I can turn code into turing machines, that I can use as a code with a 1:1 relationship, I have a ... mapping \o/

What does compile does?
It evaluates for all the input possible that is an integer belongs to [0:255] all the output possible of boolean function. It is a projection of [2^8] input => 2 output
I projected a discrete space of input to a discrete space of output.

Let's see why it is great

My code is fully deterministic and is threadsafe because my code is stateless.

It is an index of all the 256 solutions for f(x) for every possible values.

if I encode a function that tells if a number can be divided by X another one by Y to have the function that tells if a number can be divided by (X * Y) I just have to apply then & (bitwise and operator) to the int representing the code.

An int is a very cool for a storage of function.
With div2 / div 3 I can by applying all the «common bitwise operator» create a lot of interesting functions:

div2xor3 : a code that indicates number that can be divided by 2 or 3 but not 6
not div2: every even/odd number
div2or3: multiple of 2, 3 and 6
div2and3: multiple of 6 only
....

I can combine the 16 bliter operations to directly obtain functions.

In functional programming you do partial function that you apply in a pipe of execution, here you can directly combine the code at «implementation level»


My evaluation is always taking the same number of cycles, I don't have to worry about worst case, and my code will never suffer from indetermination (neither in execution time nor results). My code is ultimately threadsafe as long as my code storage and my inputs are immutables. 


My function are commutative thus I can distribute them.

div2(div3(val)) == div3(div2(val)) (== div6(val))

=> combining function is a simple and of the code

Why we don't use that in real life

First there is a big problem of size.

To store all the results for all the possible inputs, I have to allocate the cross product of size of input * size of output.

A simple multiplication table by 3 for all the 32 bits integers would be 32 bit * 32 bits = an array of 4billions worlds of 32 bits. 16Gbytes!

Not very efficient.

But if we work on a torus of discrete value, it can work :)

Imagine my FPU is slow and I need cos(x) with an error margin sufficient to only work in 1/256 of radians? I can store my results as an array of precomputed cosinus value expressed in fraction of 256%256 :)

A cache with memoization is also using the same principle.
You replace computing code that is long by a lookup in a table.

It might be a little more evoluted than reading a bit in an integer, but it is globally the same principle.

So actually, that is one of the biggest use of the turing machine: efficient caching of pre computed values.

Another default, is that the mapping make you lose information on what the developer meant.

If you just have the integer representing your code, more than one function can yield the same code. The mapping from the space of the possible function to the space of the solutions is a surjection.

Thus if you have a bug in this code, you cannot revert back to the algorithm and fix it.

if I consider I have not a number of n bit as an input but n input of 1 bit constituting my state input vector,  and the output is my internal state, than I am modeling a node of a parallel computer. This «code» can be wired (few clocks costs) as a muxer that is deterministic in its execution time and dazzling fast.


What is the use of this anyway?

Well, it models deterministic code.

I can generate random code and see how they interact.

The Conway's Game of life is a setup of turing machine interconnected to each other in a massively parallel fashion.

So my next step is to prove I can generate pure random numbers with totally deterministic code.

And I can tell you I can prove the condition for my modified game of life to yield chaotic like results is that the level of similarity for every code on every automaton is low (entropy of patterns is high) AND 50% of the bits are 0/1 in code (maximizing the entropy of the code in ratio of bits).

 






Auto documenting and validating named arguments

In real life there are companies were keywords (**kw) are prohibited, and others where positional arguments can stack up as high as seven. Let's state the obvious concerning arguments:

Human beings are flawed


The positional arguments are fixed arguments. Some are mandatory others are optional (and are being set at function definition). 

Named arguments is a way to give arguments which are known by their name not their positions. 

If we humans were not limited by our short term memory then positional arguments would be enough. Alas we are limited to 7 items in memory +- 2. I normally strongly advise to remember that even if you can go up to 9 because you are genius, when you are woken up at 3 am after a party for a critical bug your short term memory might drop to 5+-2 items. So be prepared for the worse and follow my advice and try to stick to 3 mandatory positional arguments the more you can. 

Then, you have the case against the named arguments.

Named arguments are great for writing readable calls to function especially when there are a lot of optional arguments or when calls can be augmented on the fly thanks to duck typing and all funky stuffs that makes programming fun.

However, the documentation of the function might seems complex because you have to do it by hand as you can see here :
https://github.com/kennethreitz/requests/blob/master/requests/api.py

Plus the signature of the function is quite ugly.

    
get(url, **kwargs)
        Sends a GET request. Returns :class:`Response` object.
        
        :param url: URL for the new :class:`Request` object.
        :param **kwargs: Optional arguments that ``request`` takes.

So, even if named arguments are great they are painful to document (thus a little less maintainable), and gives function a cryptic signature when used in the form **kwargs.


Having explicit named arguments with default values are therefore «more pythonic» since :

Explicit is better than implicit. 

Decorators for easing validation documentation 



Since I am an advocate of optional named arguments and I find them cool, I thought why not write code ... 

>>> @set_default_kw_value(port=1026,nawak=123)
... @must_have_key("name")
... @min_positional(2)
... @validate(name = naming_convention(), port = in_range(1024,1030 ))
... def toto(*a,**kw):
...     """useless fonction"""
...     return 1

... that would magically return a documentation looking like this:


toto(*a, **kw) useless fonction

keywords must validate the following rules:
  • key: <port> must belong to [ 1024, 1030 [,
  • key: <name> must begin with underscore
at_least_n_positional :2

keyword_must_contain_key :name

default_keyword_value :
  • params: port is 1026,
  • params: nawak is 123

The idea was just to make a class making decorator with reasonable defaults that would enhance the decorated function documentation based on functools.wrap code.


class Sentinel(object):
    pass
SENTINEL=Sentinel()

def default_doc_maker(a_func, *pos, **opt):
    doc = "\n\n%s:%s" % (a_func, a_func.__doc__)
    posd= "%s\n" % ",".join(map(str, pos))  if len(pos)  else ""
    named = "\n%s" % ",\n".join([ "* params: %s is %r"%(k,v) for k,v in opt.items() ]
        ) if len(opt) else ""
    return """
**%s** :%s
%s""" % ( 
        a_func.__name__,
        posd,
        named
    )


def valid_and_doc(
            pre_validate = SENTINEL,
            post_validate = SENTINEL,
            doc_maker = default_doc_maker
        ):
    def wraps(*pos, **named):
        additionnal_doc=""
        if pre_validate is not SENTINEL:
            additionnal_doc += doc_maker(pre_validate, *pos, **named)
        if post_validate is not SENTINEL:
            additionnal_doc += doc_maker(post_validate, *pos, **named)
        def wrap(func):
            def rewrapped(*a,**kw):
                if pre_validate is not SENTINEL:
                    pre_validate(*pos,**named)(*a,**kw)
                res = func(*a,**kw)
                if post_validate is not SENTINEL:
                    post_validate(*pos,**named)(*a,**kw)
                return res

            rewrapped.__module__ = func.__module__
            rewrapped.__doc__=func.__doc__  + additionnal_doc
            rewrapped.__name__ = func.__name__
            return rewrapped
        return wrap
    return wraps



That can be used this way :

def keyword_must_contain_key(*key):
    def keyword_must_contain_key(*a,**kw):
        if set(key) & set(kw) != set(key):
            raise Exception("missing key %s in %s" % (
                  set(key)^( set(kw)& set(key)),kw)
            )
    return keyword_must_contain_key


def at_least_n_positional(ceil):
    def at_least_n_positional(*a, **kw):
        if a is not None and len(a) < ceil:
            raise Exception("Expected at least %s argument got %s" % (ceil,len(a)))
    return at_least_n_positional

min_positional= valid_and_doc(at_least_n_positional)
must_have_key = valid_and_doc(keyword_must_contain_key) 

Okay, my code might not get an award for its beauty, but you can test it here https://github.com/jul/check_arg


And at least sphinx.automodule accepts the modified docs, and interactive help is working too. 

Of course, it relies on people correctly naming their functions and having sensibles parameters names :P

However, though it sound ridicule, I do think that most of our experience comes from knowing the importance of naming variables, modules, classes and functions correctly.


Conclusion



Since I am not satisfied by the complexity/beauty of the code I strictly have no idea if I will package it, or even work on it. But at least, I hope you got the point that what makes optional optional named arguments difficult to document is only some lack of imagination. :)



Cross dressing on the internet and gender issues

So the actual buzz is gender issues



http://www.kathrineswitzer.com/written_about.shtml
I dare say it is a non problem. But before let me tell you my story as a cross dresser ... on the internet.

Once upon a time I signed up for a famous dating site. And as advised by a friend, I was told to try with a fake feminine account. Well, let me tell you: it feels awkward. You are not at the right place. You directly notice it because communication is really different. Communication is sexualized.

At least it gave me tips on how to hit on girls: which techniques were working (thanks to fellow men), and which were not. After 3 hours, I stopped and analyzed on what I learned in the gender difference issues, and scripted a very efficient lightweight bot to help me improve my dating ratios.

Years later, I became a City of Heroes player. I had two very good reasons to cross dress again:
- if you ever played a MMORPG (meuporg as we say in my country), you may notice that in order to level up a support player needs to team with a party, and men prefer to team up with girls;
- the camera was always showing the player in subjective view, and feminine 3D model were awesome.

That's how I became half a man, and half woman. My damage dealers/tanks would be men (because they were accepted easily) and my support/controls were women. I was «love», «tainted love», «true love» and was having a perfect body I could enjoy watching... And even women would prefer my masculine looking damage dealers. Everyone is biased ...

Well, being proposed many times in global canals was disturbing. My chat was only feminized using huhu and hihi and nothing else, but talking like a men for the remaining turned them on. It was weired. So, I created a feminine only guild so that we wouldn't be bothered, and played with women. Needless to say the guild was 66% masculine IRL players (huhu). And I discovered women are not only common but good players. And we teamed up with feminine guilds too (that were knowing we were mainly men).

I then decided MMORPG were really enjoyable but taking too much of my time, so I get to a more fast paced game known as Urban Terror where my nick became [SF]Julie.  During pickups, I was favorised even though my level was lower (there are very good she players on Urt btw, I was even below their average).
I had of course troubles on public servers: people trying to hit on me or being sexists, but as I was an admin of our public server, they would get kicked/banned easily. And, I dare say that by being quite extreme, girls would enjoy playing on our server since they would not be annoyed.

Finally, nowadays on IRC I am an androgynous creature named Julie1 (julie + 1 pronounce as my real first name in french but people oddly enough just only read the feminine part). So I still am cross dressing in a way, and let me tell you, it has one advantage: on tech channels I get answers faster than «men».

Everybody is being pissed by the bad behaviors, and not the «unfair positive bias». And that bugs me; gender issues is like an auto reflexive feedback loop. How do we break it?

Speaking of women in free software



First I apologize for having crashed libroscope server. But as you can notice here, we libroscope were amongst the first to publicly speak of women in free software. Our method was to let them speak by themselves.

Since I am quite doubting everything, I was quite dubious to their claims they were discriminated. But, we let them speak. And I listened. Perline claimed for instance that the problem with men is that no project can be achieved in a mixed environment since men would take the lead, and not let women express themselves. And I thought to myself: «what a joke!» and then came the Q&A.

Well, for 15 minutes, an -he- anarchist, and very sensitive to women problems and utterly activist would explain to every one his problems as a women. The actual women present in the room were not even able to talk a single word. It was like a proof in example.

He was trying to shine in his white armor of protecting the women's pride. And all my years as a cross dresser came back: men talking on behalf of women on gender issues  is weired. Women being pushed in conference is also weired: it is like when I was accepted for partying only based on my gender. At my opinion it does not help women cause, it reinforces the bias.

So my message to men heralding women would be better understood in song I guess :

Do I have a solution?


Critic is easy. And I have a solution. One of our speaker the year before -Benjamin Mako Hill- made an awesome speech on the first freedom of free software (and that explained especially why non commercial use was bullshit), the freedom to use that in its term is radically non discriminative.

The ethic of free software/Open Source is based on action, and production. And I think that by being a regular free software user we should envision the deep implication is has:
- not a single discrimination, positive or negative is acceptable;
- when we install/use/modify a software do we already care if it is French, Black, Women Alien made? No we don't...

So Free/Libre/Open Source Software community is armed for accepting women... and all other minorities...

If Perline is right and women only communities are what it takes to empower feminine presence in Free Software: please do it. Production and quality is the only stuff that matters, whatever the means you have to use. You have my full support to exclude me from your workshop the time it will take for you to produce enough software in order to be respected. Even if it is kind of strange.


On internet though -dear women- you should try cross dressing. And if you want to fully understand men, maybe you should even try cross dressing on a dating site, in games, on IRC to understand the bias we all experience...

In fact, I praise everyone to walk in the other's shoes by cross dressing on internet. 


One of the stuff that bugs me though is: are we really aiming at the right issues. Behind the gender issues wouldn't we miss a broader issue? Why was free software shaped the way it was, and what is the invisible barrier that keeps not only women, but also a lot of other minorities out of free software? And shouldn't we measure in an objective way the diversity (economic, geographic) so that we can measure the impact of our actions?

I will make a wild guess however ... Free Software is probably regressing in terms of diversity since we are becoming more and more «experts». And, we might observe more and more people leaving the way pierre 303 left stack exchange.

But as I said before, without a survey we babble nonsense: we don't give ourself the means to measure the impact of our actions and to understand the real nature of the problem. 

Using signal as wires?

Learning that signals were software interrupts in a book on Unix, I thought, «hey, let's try to play with signals in python since it is in stdlib!»

The proposed example is not what one should really do with signals, it is just for the purpose of studying.

Well remember this point it will prove important later: http://docs.python.org/dev/library/signal.html#signal.getsignal
There is no way to “block” signals temporarily from critical sections (since this is not supported by all Unix flavors).

The idea transforming a process as a pseudo hardware component



Signals are like wires normally that carries a rising edge. On low level architecture you may use a  wire to say validate when results are safe to propagate. and a wire to clear the results.
I do a simple component that just sets to 1 the bit at the nth position of a register according to the position of the wire/signal (could be used for multiplexing).


Here is the code:

#!/usr/bin/env python3.3 
import signal as s
from time import sleep
from time import asctime as _asctime
from random import randint
import sys

asctime= lambda : _asctime()[11:19]

class Processor(object):
    def __init__(self,signal_map, 
            slow=False, clear_sig=s.SIGHUP, validate_sig=s.SIGCONT):
        self.cmd=0
        self.slow=slow
        self.signal_map = signal_map
        self.clear_sig = clear_sig
        self.validate_sig = validate_sig
        self.value = 0
        self._help = [ "\nHow signal are wired"]
        self._signal_queue = []
        self.current_signal=None

        if validate_sig in signal_map or clear_sig in signal_map:
            raise Exception("Dont wire twice a signal")

        def top_half(sig_no, frame):
            ## UNPROTECTED CRITICAL SECTION 
            self._signal_queue.append(sig_no)
            ## END OF CRITICAL

        for offset,sig_no in enumerate(signal_map):
            s.signal(sig_no, top_half)
            self._help += [ "sig(%d) sets v[%d]=%d"%(sig_no, offset, 1) ]

        self._help += [ "attaching clearing to %d" % clear_sig]
        s.signal(clear_sig, top_half)
        self._help += [ "attaching validating to %d" % validate_sig ]
        s.signal(validate_sig,top_half)
        self._help = "\n".join( self._help)
        print(self._help)

    def bottom_half(self):
        sig_no = self._signal_queue.pop()
        now = asctime()
        seen = self.cmd
        self.cmd += 1
        treated=False
        self.signal=None

        if sig_no in self.signal_map:
            offset=self.signal_map.index(sig_no)
            beauty = randint(3,10) if self.slow else 0
            if self.slow:
                print("[%d]%s:RCV: sig%d => [%d]=1 in (%d)s" % (
                    seen,now,sig_no, offset, beauty
                ))
                sleep(beauty)
            self.value |= 1 << offset 
            now=asctime() 
            print("[%d]%s:ACK: sig%d => [%d]=1 (%d)" % (
                seen,now,sig_no, offset, beauty
            ))
            treated=True

        if sig_no == self.clear_sig:
            print("[%d]%s:ACK clearing value" % (seen,now))
            self.value=0
            treated=True

        if sig_no == self.validate_sig:
            print("[%d]%s:ACK READING val is %d" % (seen,now,self.value))
            treated=True

        if not treated:
            print("unhandled execption %d" % sig_no)
            exit(0)

wired=Processor([ s.SIGUSR1, s.SIGUSR2, s.SIGBUS, s.SIGPWR ])

while True:
    s.pause()
    wired.bottom_half()
    sys.stdout.flush()

Now, let's do some shell experiment:
$ ./signal_as_wire.py&
[4] 9332
660 jul@faith:~/src/signal 19:04:03
$ 
How signal are wired
sig(10) sets v[0]=1
sig(12) sets v[1]=1
sig(7) sets v[2]=1
sig(30) sets v[3]=1
attaching clearing to 1
attaching validating to 18

660 jul@faith:~/src/signal 19:04:04
$ for i in 1 12 10 7 18 1 7 30 18; do sleep 1 && kill -$i %4; done
[0]19:04:31:ACK clearing value
[1]19:04:32:ACK: sig12 => [1]=1 (0)
[2]19:04:33:ACK: sig10 => [0]=1 (0)
[3]19:04:34:ACK: sig7 => [2]=1 (0)
[4]19:04:35:ACK READING val is 7
[5]19:04:36:ACK clearing value
[6]19:04:37:ACK: sig7 => [2]=1 (0)
[7]19:04:38:ACK: sig30 => [3]=1 (0)
[8]19:04:39:ACK READING val is 12

Everything works as it should, no? :) I have brilliantly used signals to transmit data asynchronously to a process. With 1 signal per bit \o/

What about «not being able to block the signal»



$ for i in 1 12 10 7 18 1 7 30 18; do echo "kill -$i 9455; " ; done | sh
[0]22:27:06:ACK clearing value
[1]22:27:06:ACK clearing value
[2]22:27:06:ACK: sig7 => [2]=1 (0)
[3]22:27:06:ACK: sig30 => [3]=1 (0)
[4]22:27:06:ACK: sig10 => [0]=1 (0)
[5]22:27:06:ACK: sig12 => [1]=1 (0)
[6]22:27:06:ACK READING val is 15

Oh a race condition, it already appears with the shell launching the kill instruction sequentially: the results are out of order. Plus you can clearly notice my critical section is not small enough to be atomic. And I lost signals :/

Is python worthless?


Not being able to block your code, is even making a top half/bottom half strategy risky. Okay, I should have used only atomic operations in the top half (which makes me wonder what operations are atomic in python) such has only setting one variable and doing the queuing in the while loop, but I fear it would have been worse.

Which means actually with python, you should not play with signals such as defined in stdlib since without blocking you have systematical race conditions or you risk loosing signals if you expect them to be reliable.

I am playing with signals as I would be playing with a m68K interrupt (I would still block signal before entering the critical section). To achieve the blocking and processing of pending signals I would need POSIX.1 sigaction, sisget, sigprocmask, sigpending.

Why python does not support them (in the stdlib)?

Well python is running on multiple operating systems, some do support POSIX.1 some don't. As signals are not standardized the same way except for POSIX compliant systems with the same POSIX versions, therefore it should not be in st(andar)dlib. And since it is *that* risky I would advocate not allowing to put a signal handler at first place (except for alarm maybe). But, take your own risk accordingly :)

If you feel it is a problem, then just remember binding C code to python is quite easy, and that on POSIX operating system we have everything we need. This solution given in stackoverflow is funky but less than having unprotected critical section: http://stackoverflow.com/a/3792294/1458574.

My problem with Computer Science pseudo code

I remembered opening books of Computer Science to learn algorithm when I began. And my first reaction was: it seems unnecessarily complicated like maths, but I must be an idiot since I learnt physics.

So 15 years later I reopened Introduction To Algorithms By Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, Clifford Stein. And I decided to see if my experience in coding would help me appreciate this book for all the knowledge it could bring me.

So I decided to take a look at the simple heap structure and algorithms:

Let's look at the pseudo code (see here for the whole chapter):

When I see this pseudo code, and think of actual production code, I am dazzled:

  • it has meaningless variable;
  • API and variable names (like the global heap_size) are quite hard to grok; 
  • I wonder if recursion is really important knowing it is hard to read, debug, and may reach a recursion limit;
  • I have a hard time understanding how it works. 
So  I thought, maybe I am just an idiot and I tried to see if I could do a better job at writing pseudo code which would help understand. (I used python as a pseudo code checker)

Here is my heapify function (the whole code is here  https://gist.github.com/3927135 if you want to check the correctness):

def heapify(_array,index=1, heap_size=SENTINEL):
    """bubbles down until it is correctly located.
    heap_size can be used internally for optimization purpose"""
    if heap_size == SENTINEL:
        heap_size= len(_array)
    a_child_is_bigger= True
    while a_child_is_bigger:
        largest= index
        left_pos, right_pos = left_offset(index), right_offset(index)

        #check left
        if left_pos < heap_size and _array[left_pos] > _array[index]:
            largest= left_pos

        #check right
        if right_pos < heap_size and _array[right_pos]>_array[largest]:
            largest= right_pos

        if largest == index:
            # we are finally the king of the hill, end of story
            a_child_is_bigger= False
        else:
            #swap with child // bubble down
            _array[index], _array[largest] = _array[largest], _array[index]
            index= largest

And by coding it revealed what I feared the most: CS and computer developers don't live in the same world: the way a code is written matters.

  1. getting read of the asymmetric test line 5 greatly improves the understanding of the logic;
  2. by using full name for tests conditions you really help people (you included) understanding thus maintaining your code;
  3. recursion comes in the way of understanding straightforward code; 
  4. one letter variable names are really unhelpful variable names;
  5. their code typography is even worse than mine.
I have made python code to check the whole logic: writing a tinge more readable code does not seems to prevent it from working (I may have a problem on the boundary because this hidden heap_size variable is a pain to understand).

By the way if you want really efficient heap structure in python please use : http://docs.python.org/library/heapq.html because it is just plain better, superbly documented, tested and the source code is quite nice too.

Now, I am plain lost: when I check CS papers because I want to solve a problem I often stumble on unreadable pseudo code and verbiage. Understanding the API, the variable name ... takes an awful lot of time. When I write in python I have the feeling that it is a Pseudo Code Translator, so if we assume my implementation is close visually and logically from pseudo code, does it costs that much to improve pseudo code readability?

If I can do it knowing that I am one of the most stupid developer among the one I know, why or how can't they do the same. Pseudo code is for sharing knowledge, do we really need to waste time de-ciphering these pseudo codes to be a real or better developer?

When you see equivalent code in industry grade application, you normally hate that. It is plain bad practices. Isn't computer sciences supposed to teach elite grade developers?

And when I see such code, I sometime wonder if the guy understood what he wrote.

Meaningful variables names, test conditions, readability is what matters, because code is meant to be improved and fixed.

With all these questions in my head I came to a conclusion: I don't take for profound what I think is uselessly obscure, so my opinion of CS papers and academics has dramatically dropped.


EDIT: now, I see that bulbbling down and bubbling up can be extracted from the logic, and I guess it can be used to make a heap usable for min/max efficient inserting/extraction.


Unicode is tough

So today, I dared open a bug on python. Which at one point should make me feel mortified, since it has proven that I misunderstood what a character is.

The point was in python 3.2:
foo⋅bar=42
#  File "stdin", line 1
#    foo⋅bar=42
#            ^
#SyntaxError: invalid character in identifier
### This is another bug that is not in the scope of the post
### http://bugs.python.org/issue2382
print(ord("foo⋅bar"[3]))
# 8901
foo·bar = 42
print(ord("foo·bar"[3]))
# 183

A point is a punctuation mark, no? And variable names shouldn't use punctuation.
Plus it looks the same, shouldn't it be considered the same?

So I opened a bug and I was pointed very nicely to the fact that unicode characters "MIDDLE DOT" is indeed a punctuation but it also has the unicode property Other_ID_Continue. And as stated in python rules for identifiers, it is totally legitimate.

That is the point where you actively search for a good documentation to understand what in your brain malfunctioned. Then a Perl coder pointed me to Perl Unicode Essentials from Tom Christiansen. Even if the 1st third is about Perl, it is the best presentation so far on unicode I have read.


And then I understood my mistakes:
  • I (visually) confused a glyph with a character: a same glyph can be used for different characters;
  • unicode is much more than simply extending the usable glyphs (that I knew, but I did not grasped that I new so little).

By the way if you need a reason to switch to the current production version 3.3.0
remember Py3.3 is still improving in unicode support

py3.2 :
"ß".upper()
# ß  

which is a wrong result while in py3

"ß".upper()
# SS  


Fun with signal processing and ... matplotlib

Well, I remember overhearing a conversation one day in a  laboratory once 10 years ago.

Electronic engineers were proud to have helped their team to higher CDD resolution with a trick they published:
they would use a median filter to filter out the photon arriving from the CDD from the thermal noise. They were even very proud of seeing a very signal far smaller than the noise.

I thought to myself. Such a scam, this is to stupid it cannot work.

So I decided to play with matplotlib to test it.

First we generate white noise, then given a probability of 1/n a photon is seen, and its amplitude is 1/A of the signal.

Then, I used #python-fr to have advices on the best way to do a moving average (which is the usual filter used for filtering out signals), then I tried the median filter.

A median filter is just as silly as taking the median in a moving window on a time serie.

It cant work, can't it?

Well, it does.

Is it not hard  to code, but beware your eyes will cry from PEP8 insanity and mapltotlib style mixed with pythonic style (call it art):
(https://gist.github.com/3794506)



Moral of the story? 

 

Well, I should have to publish the code that shows why 40 for median and 75 for movering average are the optimal settings, and how dazzling fast matplotlib is.

But the real point is python + matplotlib are amazing. And that sometimes simple ideas give amazing results.

Have fun!

The limitation of frameworks.

In my young days when I began coding, the synonym for code reusability was libraries in procedural languages and classes in object oriented languages, then it became modules or software packages, and afterwards it became frameworks.

Trends change, the sense of these words may differ but the goal stays the same: doing more in less code you share with the others.

I could do a very abstract presentation of the limitations of frameworks, but let's talk of code as if it is literature.


What is a language?



I do learn computer languages like I learn foreign languages, I like to focus on typical expressions that make the language interesting, I don't try to translate one word for one word. These are the idiomatic expressions. Some language put a stress on manipulating vectors of data like a single variable, others on being terse, others on letting you express the way you want.

A language in itself to be efficient has to be embraced as a whole, as a frame set. And the reward for accepting the language are new ideas, and new ways to express yourself. For me a programmer likes a language more than the others for a question of aesthetic. That's why language wars are so heated and opinionated.


What is a program?



A program is a story you tell in a language and some stories sound better in some languages. For instance erlang or node.js seem better suited to tell asynchronous stories. R seems better suited to tell a story about how a series of data can be interpreted as information....

A program is an essay; your customer set a stage with characters,  one or more places, some events situation he wants you to build a story that leads to the desired conclusion. In the middle you are alone with a sort of bible given by the customers with all that is relevant to set the stage, and no one cares how you do it, but you have to tell a consistent story driving the characters in the desired stage.


What is a pattern?


A (design) pattern is like a cliché in a story. When you see a hero saving a nice lady in distress you expect them to kiss in the end. Well, as a coder I mainly think this way: given an initial conformation of data and a desired outcome, I sometimes have a feeling of déjà vu and  will lazily tell my story in an obvious way (that may not be obvious to a duchess for it is sometimes crude). This is not code since words may vary. We all have a style or constraints given by the customer that may alter the possibility to use a crude cut'n paste. So patterns are not about actual code it is about a global direction.

It is like when you want to make a gag in a movie, you can rely on quiproquo, rolling gags, misdirections... These are mechanisms you can rely on, you still have to describe the situation and write the dialogue, you cannot just adapt a sketch of another movie.

Wrapping up and stating what a framework is



I see programming as writing an essay given constraints given by an editor. The editor wants us to write in a genre, with given elements, with a desired conclusion and a deadline. We could write anything in the world if we had enough time, but since our schedules are tight we have to be terse, fast, and reuse patterns and code.

Frameworks are mainly story generators. 

Do you see these TV shows  with stereotyped characters, stereotyped situations, and stereotyped development according to a «bible»? Do you remember Starsky & Hutsch, J.A.G. (and all Belissario production), New York Special Unit ...
These TV shows are to literature what frameworks are to the computer languages: a restriction of what you can do with your imagination. They were enjoyable shows amongst these ones (Wild Wild West for instance) and it is not a bad thing. Of course, some scenarists made TV shows evolve by not telling the same story other and other during the whole show and still they stayed consistent: I still have fond memories of Twin Peaks, or babylon 5.

I don't say frameworks are bad: we need them, because of the deadlines.

I say they limit our options to already existing stories. As a coder, I do think that writing a story that already exists is a waste of time. Adapt an existing story (buy a software) if you want something close to an existing story. When you make variation to an existing show you take the risk to alter the conceptual integrity of the story. In programming I call this «disengaging a framework». This as far as I am concerned is tough. I remember having to add a custom authentication mechanism to turbogears for a flash application and it took me two weeks to have it right.

My experience is the more constraints a framework imposes, the less idiomatic (in regard to the original language it is coded in) a framework is, the harder to disengage it will be.


Why disengaging a framework matters?



I remember why I decided I would never code in python one day.

It was in 2004 I was directing a topic at Free Software Meeting and since the organisation was eating its own dogfood, the website was in a CMS framework (Plone?) in python. I pride myself in understanding languages quickly because they all share common roots (the more languages you know the easier it is to learn new ones), and I had a problem: speeches were not sorted in alphabetical order but regarding the DB id. I thought it would be easy to find where the list was sorted to introduce the right ordering functions. In 6 hours I was unable to find how it was working, because it was not a language, it was a frame set very hard to grok. And frameworks are not a language they are yet more powerful, but so less expressive.

But when I tell you the customers are coming with strict situations, characters and definitive ideas of what they wanted : I lied

"Programmers don't burn out on hard work, they burn out on change-with-the-wind directives and not 'shipping'."
Mark Berry 
Customers are the biggest liars, they always want something slightly different than the initial specification, and different from what the framework you have chosen proposes. Thus you will have to disengage sooner or later.



Can we improve the situation on the code side?


Well, the situation has improved already since a long time ago.

What I described so far as frameworks are jumbo web frameworks like Ruby on Rails, symfony, Django, turbogears, Plone, ASP.NET ...

Thanks to developers such as Blake Mizerany (who inspired Armin Roacher I guess for flask) and numerous other developers we have already have lightweight frameworks: ramaze (ruby), play (java) ,flask (python), sinatra (python), dancer (perl) .... frameworks are not bad, they are fantastic tools to code less, but they freeze you in a mindset while our added value lies in our agility. Thus, the main quality of a modern framework (in my opinion) is its flexibility.

Furthermore, these last years I have met developers that have almost always coded with one framework. I notice some of them hardly knew the finesse of the language they code in (and I like to call them monkey coders). But for the others I wonder what they will become when their framework will become obsolescent? Will they still be able to code in python (ruby, PHP, Perl or whatever)? Maybe Zed Shaw was wrong, Ruby on Rails was not a ghetto, maybe all jumbo frameworks are ghettos.

Like the dealers in «the wire» speaking their slang for too long ... staying in their culture for too long and hardly able to leave their former life of dealers or junkies.  


PS thanks to bruno  for the corections
«the wire» is an amazing show
If I was a rubyist I would code web sites with ramaze
If I was a perl coder I would use dancer
And if I was a coder I would not use PHP because it is neither consistent nor terse.