Discussion:
[PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack
(too old to reply)
Matt Foley
2012-11-21 19:15:29 UTC
Permalink
This discussion started in
HADOOP-8924<https://issues.apache.org/jira/browse/HADOOP-8924>
, where it was proposed to replace the build-time utility "saveVersion.sh"
with a python script. This would require Python as a build-time
dependency. Here's the background:

Those of us involved in the branch-1-win port of Hadoop to Windows without
use of Cygwin, have faced the issue of frequent use of shell scripts
throughout the system, both in build time (eg, the utility "saveVersion.sh"),
and run time (config files like "hadoop-env.sh" and the start/stop scripts
in "bin/*" ). Similar usages exist throughout the Hadoop stack, in all
projects.

The vast majority of these shell scripts do not do anything platform
specific; they can be expressed in a posix-conforming way. Therefore, it
seems to us that it makes sense to start using a cross-platform scripting
language, such as python, in place of shell for these purposes. For those
rare occasions where platform-specific functionality really is needed,
python also supports quite a lot of platform-specific functionality on both
Linux and Windows; but where that is inadequate, one could still
conditionally invoke a platform-specific module written in shell (for
Linux/*nix) or powershell or bat (for Windows).

The primary motive for moving to a cross-platform scripting language is
maintainability. The alternative would be to maintain two complete suites
of scripts, one for Linux and one for Windows (and perhaps others in the
future). We want to avoid the need to update dual modules in two different
languages when functionality changes, especially given that many Linux
developers are not familiar with powershell or bat, and many Windows
developers are not familiar with shell or bash.

Regarding the choice of python:

- There are already a few instances of python usage in Hadoop, such as
the utility (currently broken) "relnotes.py", and massive usage of python
in the examples/ and contrib/ directories.
- Python is also used in Bigtop build-time.
- The Python language is available for free on essentially all
platforms, under an Apache-compatible
license<http://www.apache.org/legal/resolved.html>.

- It is supported in Eclipse and similar IDEs.
- Most importantly, it is widely accepted as a reasonably good OO
scripting language, and it is easily learned by anyone who already knows
shell or perl, or other common scripting languages.
- On the Tiobe index of programming language
popularity<http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html>,
which seeks to measure the relative number of software engineers who know
and use each language, Python far exceeds Perl and Ruby. The only more
well-known scripting languages are PHP and Visual Basic, neither of which
seems a prime candidate for this use.

For build-time usage, I think we should immediately approve python as a
build-time dependency, and allow people who are motivated to do so, to open
jiras for migrating existing build-time shell scripts to python.

For run-time, there is likely to be a lot more discussion. Lots of folks,
including me, aren't real happy with use of active scripts for
configuration, and various others, including I believe some of the Bigtop
folks, have issues with the way the start/stop scripts work. Nevertheless,
all those scripts exist today and are widely used. And they present an
impediment to porting to Windows-without-cygwin.

Nothing about run-time use of scripts has changed significantly over the
past three years, and I don't think we should hold up the Windows port
while we have a huge discussion about issues that veer dangerously into
religious/aesthetic domains. It would be fun to have that discussion, but I
don't want this decision to be dependent on it!

So I propose that we go ahead and also approve python as a run-time
dependency, and allow the inclusion of python scripts in place of current
shell-based functionality. The unpleasant alternative is to spawn a bunch
of powershell scripts in parallel to the current shell scripts, with a very
negative impact on maintainability. The Windows port must, after all, be
allowed to proceed.

Let's have a discussion, and then I'll put both issues, separately, to a
vote (unless we miraculously achieve consensus without a vote :-)

I also encourage members of the other Hadoop-related projects, to carry
this discussion into those forums. It would be very cool to agree on a
whole-stack solution for the scripting problem.

Best regards,
--Matt
Alejandro Abdelnur
2012-11-21 19:25:04 UTC
Permalink
Hey Matt,

We already require java/mvn/protoc/cmake/forrest (forrest is hopefully on
its way out with the move of docs to APT)

Why not do a maven-plugin to do that?

Colin already has something to simplify all the cmake calls from the builds
using a maven-plugin (https://issues.apache.org/jira/browse/HADOOP-8887)

We could do the same with protoc, thus simplifying the POMs.

The saveVersion.sh seems like another prime candidate for a maven plugin,
and in this case it would not require external tools.

Does this make sense?

Thx
Post by Matt Foley
This discussion started in
HADOOP-8924<https://issues.apache.org/jira/browse/HADOOP-8924>
, where it was proposed to replace the build-time utility "saveVersion.sh"
with a python script. This would require Python as a build-time
Those of us involved in the branch-1-win port of Hadoop to Windows without
use of Cygwin, have faced the issue of frequent use of shell scripts
throughout the system, both in build time (eg, the utility
"saveVersion.sh"),
and run time (config files like "hadoop-env.sh" and the start/stop scripts
in "bin/*" ). Similar usages exist throughout the Hadoop stack, in all
projects.
The vast majority of these shell scripts do not do anything platform
specific; they can be expressed in a posix-conforming way. Therefore, it
seems to us that it makes sense to start using a cross-platform scripting
language, such as python, in place of shell for these purposes. For those
rare occasions where platform-specific functionality really is needed,
python also supports quite a lot of platform-specific functionality on both
Linux and Windows; but where that is inadequate, one could still
conditionally invoke a platform-specific module written in shell (for
Linux/*nix) or powershell or bat (for Windows).
The primary motive for moving to a cross-platform scripting language is
maintainability. The alternative would be to maintain two complete suites
of scripts, one for Linux and one for Windows (and perhaps others in the
future). We want to avoid the need to update dual modules in two different
languages when functionality changes, especially given that many Linux
developers are not familiar with powershell or bat, and many Windows
developers are not familiar with shell or bash.
- There are already a few instances of python usage in Hadoop, such as
the utility (currently broken) "relnotes.py", and massive usage of python
in the examples/ and contrib/ directories.
- Python is also used in Bigtop build-time.
- The Python language is available for free on essentially all
platforms, under an Apache-compatible
license<http://www.apache.org/legal/resolved.html>.
- It is supported in Eclipse and similar IDEs.
- Most importantly, it is widely accepted as a reasonably good OO
scripting language, and it is easily learned by anyone who already knows
shell or perl, or other common scripting languages.
- On the Tiobe index of programming language
popularity<
http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html>,
which seeks to measure the relative number of software engineers who know
and use each language, Python far exceeds Perl and Ruby. The only more
well-known scripting languages are PHP and Visual Basic, neither of which
seems a prime candidate for this use.
For build-time usage, I think we should immediately approve python as a
build-time dependency, and allow people who are motivated to do so, to open
jiras for migrating existing build-time shell scripts to python.
For run-time, there is likely to be a lot more discussion. Lots of folks,
including me, aren't real happy with use of active scripts for
configuration, and various others, including I believe some of the Bigtop
folks, have issues with the way the start/stop scripts work. Nevertheless,
all those scripts exist today and are widely used. And they present an
impediment to porting to Windows-without-cygwin.
Nothing about run-time use of scripts has changed significantly over the
past three years, and I don't think we should hold up the Windows port
while we have a huge discussion about issues that veer dangerously into
religious/aesthetic domains. It would be fun to have that discussion, but I
don't want this decision to be dependent on it!
So I propose that we go ahead and also approve python as a run-time
dependency, and allow the inclusion of python scripts in place of current
shell-based functionality. The unpleasant alternative is to spawn a bunch
of powershell scripts in parallel to the current shell scripts, with a very
negative impact on maintainability. The Windows port must, after all, be
allowed to proceed.
Let's have a discussion, and then I'll put both issues, separately, to a
vote (unless we miraculously achieve consensus without a vote :-)
I also encourage members of the other Hadoop-related projects, to carry
this discussion into those forums. It would be very cool to agree on a
whole-stack solution for the scripting problem.
Best regards,
--Matt
--
Alejandro
Matt Foley
2012-11-21 19:44:18 UTC
Permalink
Hi Alejandro,
For build-time issues in branch-2 and beyond, this may make sense (although
I'm concerned about obscuring functionality in a way that only maven
experts will be able to understand). In the particular case of
saveVersion.sh, I'd be happy to see it done automatically by the build
tools.

However, for build-time issues in the non-mavenized branch-1, and for
run-time issues in both worlds, the need for cross-platform scripting
remains.

Thanks,
--Matt
Post by Alejandro Abdelnur
Hey Matt,
We already require java/mvn/protoc/cmake/forrest (forrest is hopefully on
its way out with the move of docs to APT)
Why not do a maven-plugin to do that?
Colin already has something to simplify all the cmake calls from the builds
using a maven-plugin (https://issues.apache.org/jira/browse/HADOOP-8887)
We could do the same with protoc, thus simplifying the POMs.
The saveVersion.sh seems like another prime candidate for a maven plugin,
and in this case it would not require external tools.
Does this make sense?
Thx
--
Alejandro
Alejandro Abdelnur
2012-11-21 19:58:49 UTC
Permalink
Got it, thx.

BTW, for branch-1, how about doing an ant task as part of the build that
does that.

Thx
Post by Matt Foley
Hi Alejandro,
For build-time issues in branch-2 and beyond, this may make sense (although
I'm concerned about obscuring functionality in a way that only maven
experts will be able to understand). In the particular case of
saveVersion.sh, I'd be happy to see it done automatically by the build
tools.
However, for build-time issues in the non-mavenized branch-1, and for
run-time issues in both worlds, the need for cross-platform scripting
remains.
Thanks,
--Matt
Post by Alejandro Abdelnur
Hey Matt,
We already require java/mvn/protoc/cmake/forrest (forrest is hopefully on
its way out with the move of docs to APT)
Why not do a maven-plugin to do that?
Colin already has something to simplify all the cmake calls from the
builds
Post by Alejandro Abdelnur
using a maven-plugin (https://issues.apache.org/jira/browse/HADOOP-8887)
We could do the same with protoc, thus simplifying the POMs.
The saveVersion.sh seems like another prime candidate for a maven plugin,
and in this case it would not require external tools.
Does this make sense?
Thx
--
Alejandro
--
Alejandro
Konstantin Boudnik
2012-11-21 20:00:06 UTC
Permalink
I like Alejandro's idea about Maven for a few of reasons:
- bringing in a scripting environment which is known for its inter-version
idiosyncrasies just because Windows can't handle trivial shell scripting
looks like an overkill to me
- relative to above, there's a chance that Python's pre-requisites used in
Hadoop might get into a conflict with some other components in the stack.
This will be a nightmare for the integrator projects i.e. Bigtop
- Maven is de-facto standard for Java stacks
- Maven has built-in scripting language (Groovy) if some plugins aren't
sufficient for achieving whatever goals

Addressing Matt's later point about non-Mavenized Hadoop-1 line: it uses Maven
stuff suchs as deploy/install via custom ant tasks. Same approach would work
for saveVersion.sh and others, I am sure.

Cos
Post by Alejandro Abdelnur
Hey Matt,
We already require java/mvn/protoc/cmake/forrest (forrest is hopefully on
its way out with the move of docs to APT)
Why not do a maven-plugin to do that?
Colin already has something to simplify all the cmake calls from the builds
using a maven-plugin (https://issues.apache.org/jira/browse/HADOOP-8887)
We could do the same with protoc, thus simplifying the POMs.
The saveVersion.sh seems like another prime candidate for a maven plugin,
and in this case it would not require external tools.
Does this make sense?
Thx
Post by Matt Foley
This discussion started in
HADOOP-8924<https://issues.apache.org/jira/browse/HADOOP-8924>
, where it was proposed to replace the build-time utility "saveVersion.sh"
with a python script. This would require Python as a build-time
Those of us involved in the branch-1-win port of Hadoop to Windows without
use of Cygwin, have faced the issue of frequent use of shell scripts
throughout the system, both in build time (eg, the utility
"saveVersion.sh"),
and run time (config files like "hadoop-env.sh" and the start/stop scripts
in "bin/*" ). Similar usages exist throughout the Hadoop stack, in all
projects.
The vast majority of these shell scripts do not do anything platform
specific; they can be expressed in a posix-conforming way. Therefore, it
seems to us that it makes sense to start using a cross-platform scripting
language, such as python, in place of shell for these purposes. For those
rare occasions where platform-specific functionality really is needed,
python also supports quite a lot of platform-specific functionality on both
Linux and Windows; but where that is inadequate, one could still
conditionally invoke a platform-specific module written in shell (for
Linux/*nix) or powershell or bat (for Windows).
The primary motive for moving to a cross-platform scripting language is
maintainability. The alternative would be to maintain two complete suites
of scripts, one for Linux and one for Windows (and perhaps others in the
future). We want to avoid the need to update dual modules in two different
languages when functionality changes, especially given that many Linux
developers are not familiar with powershell or bat, and many Windows
developers are not familiar with shell or bash.
- There are already a few instances of python usage in Hadoop, such as
the utility (currently broken) "relnotes.py", and massive usage of python
in the examples/ and contrib/ directories.
- Python is also used in Bigtop build-time.
- The Python language is available for free on essentially all
platforms, under an Apache-compatible
license<http://www.apache.org/legal/resolved.html>.
- It is supported in Eclipse and similar IDEs.
- Most importantly, it is widely accepted as a reasonably good OO
scripting language, and it is easily learned by anyone who already knows
shell or perl, or other common scripting languages.
- On the Tiobe index of programming language
popularity<
http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html>,
which seeks to measure the relative number of software engineers who know
and use each language, Python far exceeds Perl and Ruby. The only more
well-known scripting languages are PHP and Visual Basic, neither of which
seems a prime candidate for this use.
For build-time usage, I think we should immediately approve python as a
build-time dependency, and allow people who are motivated to do so, to open
jiras for migrating existing build-time shell scripts to python.
For run-time, there is likely to be a lot more discussion. Lots of folks,
including me, aren't real happy with use of active scripts for
configuration, and various others, including I believe some of the Bigtop
folks, have issues with the way the start/stop scripts work. Nevertheless,
all those scripts exist today and are widely used. And they present an
impediment to porting to Windows-without-cygwin.
Nothing about run-time use of scripts has changed significantly over the
past three years, and I don't think we should hold up the Windows port
while we have a huge discussion about issues that veer dangerously into
religious/aesthetic domains. It would be fun to have that discussion, but I
don't want this decision to be dependent on it!
So I propose that we go ahead and also approve python as a run-time
dependency, and allow the inclusion of python scripts in place of current
shell-based functionality. The unpleasant alternative is to spawn a bunch
of powershell scripts in parallel to the current shell scripts, with a very
negative impact on maintainability. The Windows port must, after all, be
allowed to proceed.
Let's have a discussion, and then I'll put both issues, separately, to a
vote (unless we miraculously achieve consensus without a vote :-)
I also encourage members of the other Hadoop-related projects, to carry
this discussion into those forums. It would be very cool to agree on a
whole-stack solution for the scripting problem.
Best regards,
--Matt
--
Alejandro
Chris Nauroth
2012-11-21 21:03:23 UTC
Permalink
I worked on some of the Python build scripting that currently resides in
branch-trunk-win. Initially, my goal was to keep a "pure" Maven
implementation to the greatest degree possible without external scripting,
but I encountered a few problems:

1. One approach is to try to express all of the build logic with existing
Maven plugins. This turned out to be infeasible in some cases. I don't
know of an existing plugin that does anything like the logic in
saveVersion.sh/.py for walking the source tree and checksumming the files.
For protoc, I saw a proposed plugin in open source, but it hadn't reached
release status yet. For creation of the distribution tarballs, the Maven
Ant Plugin (and actually the underlying Ant tool) cannot preserve file
permissions or symlinks.

2. Considering that the first approach isn't possible, another possibility
is to write custom Maven plugins. This would require significantly more
engineering time to write and test the code. I think there are some
legitimate concerns too about supportability, because this approach would
put significant build logic into Maven plugin code instead of something
more easily visible to release engineers, like pom.xml and external
scripts. Also, I'm actually not sure that we can implement everything with
a Maven plugin. For example, I mentioned the problem of preserving file
permissions and symlinks in the distribution tarballs. Ant hasn't been
able to fix that problem due to a Java limitation, so our Maven plugins
coded in Java (or another JVM language) likely would suffer the same fate.
We might be stuck with some amount of external scripting no matter what.

Thank you,
--Chris
Post by Konstantin Boudnik
- bringing in a scripting environment which is known for its
inter-version
idiosyncrasies just because Windows can't handle trivial shell scripting
looks like an overkill to me
- relative to above, there's a chance that Python's pre-requisites used in
Hadoop might get into a conflict with some other components in the stack.
This will be a nightmare for the integrator projects i.e. Bigtop
- Maven is de-facto standard for Java stacks
- Maven has built-in scripting language (Groovy) if some plugins aren't
sufficient for achieving whatever goals
Addressing Matt's later point about non-Mavenized Hadoop-1 line: it uses Maven
stuff suchs as deploy/install via custom ant tasks. Same approach would work
for saveVersion.sh and others, I am sure.
Cos
Post by Alejandro Abdelnur
Hey Matt,
We already require java/mvn/protoc/cmake/forrest (forrest is hopefully on
its way out with the move of docs to APT)
Why not do a maven-plugin to do that?
Colin already has something to simplify all the cmake calls from the
builds
Post by Alejandro Abdelnur
using a maven-plugin (https://issues.apache.org/jira/browse/HADOOP-8887)
We could do the same with protoc, thus simplifying the POMs.
The saveVersion.sh seems like another prime candidate for a maven plugin,
and in this case it would not require external tools.
Does this make sense?
Thx
Post by Matt Foley
This discussion started in
HADOOP-8924<https://issues.apache.org/jira/browse/HADOOP-8924>
, where it was proposed to replace the build-time utility
"saveVersion.sh"
Post by Alejandro Abdelnur
Post by Matt Foley
with a python script. This would require Python as a build-time
Those of us involved in the branch-1-win port of Hadoop to Windows
without
Post by Alejandro Abdelnur
Post by Matt Foley
use of Cygwin, have faced the issue of frequent use of shell scripts
throughout the system, both in build time (eg, the utility
"saveVersion.sh"),
and run time (config files like "hadoop-env.sh" and the start/stop
scripts
Post by Alejandro Abdelnur
Post by Matt Foley
in "bin/*" ). Similar usages exist throughout the Hadoop stack, in all
projects.
The vast majority of these shell scripts do not do anything platform
specific; they can be expressed in a posix-conforming way. Therefore,
it
Post by Alejandro Abdelnur
Post by Matt Foley
seems to us that it makes sense to start using a cross-platform
scripting
Post by Alejandro Abdelnur
Post by Matt Foley
language, such as python, in place of shell for these purposes. For
those
Post by Alejandro Abdelnur
Post by Matt Foley
rare occasions where platform-specific functionality really is needed,
python also supports quite a lot of platform-specific functionality on
both
Post by Alejandro Abdelnur
Post by Matt Foley
Linux and Windows; but where that is inadequate, one could still
conditionally invoke a platform-specific module written in shell (for
Linux/*nix) or powershell or bat (for Windows).
The primary motive for moving to a cross-platform scripting language is
maintainability. The alternative would be to maintain two complete
suites
Post by Alejandro Abdelnur
Post by Matt Foley
of scripts, one for Linux and one for Windows (and perhaps others in
the
Post by Alejandro Abdelnur
Post by Matt Foley
future). We want to avoid the need to update dual modules in two
different
Post by Alejandro Abdelnur
Post by Matt Foley
languages when functionality changes, especially given that many Linux
developers are not familiar with powershell or bat, and many Windows
developers are not familiar with shell or bash.
- There are already a few instances of python usage in Hadoop, such
as
Post by Alejandro Abdelnur
Post by Matt Foley
the utility (currently broken) "relnotes.py", and massive usage of python
in the examples/ and contrib/ directories.
- Python is also used in Bigtop build-time.
- The Python language is available for free on essentially all
platforms, under an Apache-compatible
license<http://www.apache.org/legal/resolved.html>.
- It is supported in Eclipse and similar IDEs.
- Most importantly, it is widely accepted as a reasonably good OO
scripting language, and it is easily learned by anyone who already
knows
Post by Alejandro Abdelnur
Post by Matt Foley
shell or perl, or other common scripting languages.
- On the Tiobe index of programming language
popularity<
http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html>,
which seeks to measure the relative number of software engineers who know
and use each language, Python far exceeds Perl and Ruby. The only
more
Post by Alejandro Abdelnur
Post by Matt Foley
well-known scripting languages are PHP and Visual Basic, neither of which
seems a prime candidate for this use.
For build-time usage, I think we should immediately approve python as a
build-time dependency, and allow people who are motivated to do so, to
open
Post by Alejandro Abdelnur
Post by Matt Foley
jiras for migrating existing build-time shell scripts to python.
For run-time, there is likely to be a lot more discussion. Lots of
folks,
Post by Alejandro Abdelnur
Post by Matt Foley
including me, aren't real happy with use of active scripts for
configuration, and various others, including I believe some of the
Bigtop
Post by Alejandro Abdelnur
Post by Matt Foley
folks, have issues with the way the start/stop scripts work.
Nevertheless,
Post by Alejandro Abdelnur
Post by Matt Foley
all those scripts exist today and are widely used. And they present an
impediment to porting to Windows-without-cygwin.
Nothing about run-time use of scripts has changed significantly over
the
Post by Alejandro Abdelnur
Post by Matt Foley
past three years, and I don't think we should hold up the Windows port
while we have a huge discussion about issues that veer dangerously into
religious/aesthetic domains. It would be fun to have that discussion,
but I
Post by Alejandro Abdelnur
Post by Matt Foley
don't want this decision to be dependent on it!
So I propose that we go ahead and also approve python as a run-time
dependency, and allow the inclusion of python scripts in place of
current
Post by Alejandro Abdelnur
Post by Matt Foley
shell-based functionality. The unpleasant alternative is to spawn a
bunch
Post by Alejandro Abdelnur
Post by Matt Foley
of powershell scripts in parallel to the current shell scripts, with a
very
Post by Alejandro Abdelnur
Post by Matt Foley
negative impact on maintainability. The Windows port must, after all,
be
Post by Alejandro Abdelnur
Post by Matt Foley
allowed to proceed.
Let's have a discussion, and then I'll put both issues, separately, to
a
Post by Alejandro Abdelnur
Post by Matt Foley
vote (unless we miraculously achieve consensus without a vote :-)
I also encourage members of the other Hadoop-related projects, to carry
this discussion into those forums. It would be very cool to agree on a
whole-stack solution for the scripting problem.
Best regards,
--Matt
--
Alejandro
Radim Kolar
2012-11-21 21:30:14 UTC
Permalink
Post by Chris Nauroth
For creation of the distribution tarballs, the Maven
Ant Plugin (and actually the underlying Ant tool) cannot preserve file
permissions or symlinks.
maven assembly plugin can deal with file permissions. not sure about
symlinks. I do not remember dist tar to have symlinks inside.
Chris Nauroth
2012-11-21 21:44:37 UTC
Permalink
Sorry, to clarify my point a little more, Ant does allow you to make
declarations to explicitly set the desired file permissions via the
fileMode attribute of a tarfileset. However, it does not have the
capability to preserve whatever permissions were naturally created on files
earlier in the build process. This is a difference in maintainability, as
adding new files to the build may then require extra maintenance of the Ant
directives to apply the desired fileMode. This is an easy thing to
overlook. A solution that preserves the natural permissions requires less
maintenance overhead.

I couldn't find a way to make assembly plugin preserve permissions like
this either. It just has explicit fileMode directives similar to Ant.
(Let me know if I missed something though.)

To see symlinks show up in distribution tarballs, you need to build with
the native components, like libhadoop.so or bundled Snappy.

Thanks,
--Chris
Post by Chris Nauroth
For creation of the distribution tarballs, the Maven
Post by Chris Nauroth
Ant Plugin (and actually the underlying Ant tool) cannot preserve file
permissions or symlinks.
maven assembly plugin can deal with file permissions. not sure about
symlinks. I do not remember dist tar to have symlinks inside.
Radim Kolar
2012-11-21 23:15:08 UTC
Permalink
Post by Chris Nauroth
Sorry, to clarify my point a little more, Ant does allow you to make
declarations to explicitly set the desired file permissions via the
fileMode attribute of a tarfileset.
there are just 2 directories /bin and /sbin with executable files. Its
probably possible to set file mode per directory in maven assembly plugin.
Chris Nauroth
2012-11-22 00:14:58 UTC
Permalink
Unfortunately, there are a couple of spots where it gets really messy and
directory-wide rules fail to cover it. The trickiest maintenance issue is
hadoop-hdfs-httpfs, where we unpack and repack a Tomcat. Initially, I
tried to do this using only the ant plugin, but I wound up with a ton of
different tarfileset directives with different fileMode values to reapply
the same permissions that were present in the original Tomcat distribution.
This also would have been a brittle solution, because changes in the
Tomcat package would risk invalidating our ant rules. A solution that
preserves the original permissions reduces this kind of maintenance work.

Thanks,
--Chris
Post by Chris Nauroth
Sorry, to clarify my point a little more, Ant does allow you to make
Post by Chris Nauroth
declarations to explicitly set the desired file permissions via the
fileMode attribute of a tarfileset.
there are just 2 directories /bin and /sbin with executable files. Its
probably possible to set file mode per directory in maven assembly plugin.
Radim Kolar
2012-11-22 01:55:37 UTC
Permalink
The trickiest maintenance issue is hadoop-hdfs-httpfs, where we unpack and repack a Tomcat.
why its not possible to just ship WAR file? Its seems to be special
purpose app and they needs hand security setup anyway and intergration
with existing firewall/web infrastructure.

did you considered to use Jetty? it has really good maven support:
http://wiki.eclipse.org/Jetty/Feature/Jetty_Maven_Plugin
I am using jetty 8 instead of tomcat and run it with java -jar start.jar
no extra file permissions like x bit are needed.

If you really need to create tar by hand, there is java library for
doing it - http://code.google.com/p/jtar/ and it can be used from any
JVM based script language, you have plenty of choices.
Chris Nauroth
2012-11-22 02:40:08 UTC
Permalink
This predates me, so I don't know the rationale for repackaging Tomcat
inside HTTPFS. I suspect that there was a desire to create a fully
stand-alone distribution package, including a full web server. The Maven
Jetty plugin isn't directly applicable to this use case. I don't know why
it was decided to use Tomcat instead of Jetty. (If anyone else out there
has the background, please respond.) Regardless, if the desire is to
package a full web server instead of just the war, then switching to Jetty
would not change the challenges of the build process. We'd still need to
preserve whatever permissions are present in the Jetty distribution.

In general, when I was working on this, I did not question whether the
current packaging was "correct". I assumed that whatever changes I made
for Windows compatibility must yield the exact same distribution without
changes on currently supported platforms like Linux. If there are
questions around actually changing the output of the build process, then
that will steer the conversation in another direction and increase the
scope of this effort.

It seems like the trickiest issue is preservation of permissions and
symlinks in tar files. I suspect that any JVM-based solution like custom
Maven plugins, Groovy, or jtar would be limited in this respect. According
to Ant documentation, it's a JDK limitation, so I suspect all of these
would have the same problem. I haven't tried any of them though. (If
there was a feasible solution, then Ant likely would have incorporated it
long ago.) If anyone wants to try though, we might learn something from
that.

Thank you,
--Chris
The trickiest maintenance issue is hadoop-hdfs-httpfs, where we unpack
Post by Chris Nauroth
and repack a Tomcat.
why its not possible to just ship WAR file? Its seems to be special
purpose app and they needs hand security setup anyway and intergration with
existing firewall/web infrastructure.
http://wiki.eclipse.org/Jetty/**Feature/Jetty_Maven_Plugin<http://wiki.eclipse.org/Jetty/Feature/Jetty_Maven_Plugin>
I am using jetty 8 instead of tomcat and run it with java -jar start.jar
no extra file permissions like x bit are needed.
If you really need to create tar by hand, there is java library for doing
it - http://code.google.com/p/jtar/ and it can be used from any JVM based
script language, you have plenty of choices.
Steve Loughran
2012-11-22 09:02:57 UTC
Permalink
Post by Chris Nauroth
It seems like the trickiest issue is preservation of permissions and
symlinks in tar files. I suspect that any JVM-based solution like custom
Maven plugins, Groovy, or jtar would be limited in this respect. According
to Ant documentation, it's a JDK limitation, so I suspect all of these
would have the same problem. I haven't tried any of them though. (If
there was a feasible solution, then Ant likely would have incorporated it
long ago.) If anyone wants to try though, we might learn something from
that.
Thank you,
--Chris
You are limited by what File.canRead(), canWrite() and canExecute) tell you.

The absence of a way to detect file permissions in Java -is because of the
lowest-common-denominator approach of the JavaFS APIs, supporting FAT32
(odd case logic, no perms or symlinks), NTFS (odd case logic, ACLs over
perms, symlinks historically very hard to create), HFS+ (case insensitive
unix fs!) as well as classic unixy filesystems.

Ant <tarfileset> filesets in <tar> let you spec permissions on filesets you
pull into the tar; they are generated x-platform, which the other reason
why you declare them in <tar> -you have the right to generate proper tar
files even if you use a Windows box.

symlinks are problematic -even detecting them cross platform is pretty
unreliable. To really do them you'd need to add a new <symlinkfileset>
entity for <tar>, that would take the link declaration. I could imagine how
to do that -and if stuck into the hadoop tools JAR, wouldn't even depend on
a new version of Ant.

Maven just adds extra layers in the way.

-Steve
Radim Kolar
2012-11-22 14:54:27 UTC
Permalink
We'd still need to preserve whatever permissions are present in the Jetty distribution.
in jetty distribution there is just one shell startup script and you can
even run jetty without it using autostartable jar. Requirement to
preserve permissions is overkill. at most you need just to chmod +x one
script. In tomcat it would be similar.
Maven plugins, Groovy, or jtar would be limited in this respect.
In jtar you are manipulating resulting tar file directly:

http://code.google.com/p/jtar/source/browse/#svn%2Ftrunk%2Fjtar%2Fsrc%2Fmain%2Fjava%2Forg%2Fxeustechnologies%2Fjtar
Matt Foley
2012-11-21 21:14:16 UTC
Permalink
Cos,
Please see in-line.
Post by Konstantin Boudnik
- bringing in a scripting environment which is known for its
inter-version
idiosyncrasies just because Windows can't handle trivial shell scripting
looks like an overkill to me
Excuse me? Can we at least try not to belittle other people's platforms on
a public Apache forum? There's nothing trivial about implementing shell on
Windows, as cygwin regrettably proved.
Post by Konstantin Boudnik
- relative to above, there's a chance that Python's pre-requisites used in
Hadoop might get into a conflict with some other components in the stack.
This will be a nightmare for the integrator projects i.e. Bigtop
Said Bigtop project actually uses python, does it not?
Post by Konstantin Boudnik
- Maven is de-facto standard for Java stacks
Sure -- except for when Ant was the de-facto standard for Java stacks. And
let's remember what maven and ant are/were the de-facto standard for:
Doing builds. Not scripting everything that needs scripting.
Post by Konstantin Boudnik
- Maven has built-in scripting language (Groovy) if some plugins aren't
sufficient for achieving whatever goals
Are you proposing Groovy as a better scripting language than Python?
Post by Konstantin Boudnik
Addressing Matt's later point about non-Mavenized Hadoop-1 line: it uses Maven
stuff suchs as deploy/install via custom ant tasks. Same approach would work
for saveVersion.sh and others, I am sure.
Current ant scripts in Hadoop seem to use maven only for artifact
management via the maven repository. If I'm missing something, please
point it out. The ant build task currently calls out to saveVersion.sh.
Having it call out to maven, which then calls out to a plug-in and/or a
Groovy script, doesn't sound like an improvement to me. And it's a way
different use of maven than currently in the Hadoop-1 line, not a
continuation of established practice.

--Matt
Post by Konstantin Boudnik
Cos
Post by Alejandro Abdelnur
Hey Matt,
We already require java/mvn/protoc/cmake/forrest (forrest is hopefully on
its way out with the move of docs to APT)
Why not do a maven-plugin to do that?
Colin already has something to simplify all the cmake calls from the
builds
Post by Alejandro Abdelnur
using a maven-plugin (https://issues.apache.org/jira/browse/HADOOP-8887)
We could do the same with protoc, thus simplifying the POMs.
The saveVersion.sh seems like another prime candidate for a maven plugin,
and in this case it would not require external tools.
Does this make sense?
Thx
Post by Matt Foley
This discussion started in
HADOOP-8924<https://issues.apache.org/jira/browse/HADOOP-8924>
, where it was proposed to replace the build-time utility
"saveVersion.sh"
Post by Alejandro Abdelnur
Post by Matt Foley
with a python script. This would require Python as a build-time
Those of us involved in the branch-1-win port of Hadoop to Windows
without
Post by Alejandro Abdelnur
Post by Matt Foley
use of Cygwin, have faced the issue of frequent use of shell scripts
throughout the system, both in build time (eg, the utility
"saveVersion.sh"),
and run time (config files like "hadoop-env.sh" and the start/stop
scripts
Post by Alejandro Abdelnur
Post by Matt Foley
in "bin/*" ). Similar usages exist throughout the Hadoop stack, in all
projects.
The vast majority of these shell scripts do not do anything platform
specific; they can be expressed in a posix-conforming way. Therefore,
it
Post by Alejandro Abdelnur
Post by Matt Foley
seems to us that it makes sense to start using a cross-platform
scripting
Post by Alejandro Abdelnur
Post by Matt Foley
language, such as python, in place of shell for these purposes. For
those
Post by Alejandro Abdelnur
Post by Matt Foley
rare occasions where platform-specific functionality really is needed,
python also supports quite a lot of platform-specific functionality on
both
Post by Alejandro Abdelnur
Post by Matt Foley
Linux and Windows; but where that is inadequate, one could still
conditionally invoke a platform-specific module written in shell (for
Linux/*nix) or powershell or bat (for Windows).
The primary motive for moving to a cross-platform scripting language is
maintainability. The alternative would be to maintain two complete
suites
Post by Alejandro Abdelnur
Post by Matt Foley
of scripts, one for Linux and one for Windows (and perhaps others in
the
Post by Alejandro Abdelnur
Post by Matt Foley
future). We want to avoid the need to update dual modules in two
different
Post by Alejandro Abdelnur
Post by Matt Foley
languages when functionality changes, especially given that many Linux
developers are not familiar with powershell or bat, and many Windows
developers are not familiar with shell or bash.
- There are already a few instances of python usage in Hadoop, such
as
Post by Alejandro Abdelnur
Post by Matt Foley
the utility (currently broken) "relnotes.py", and massive usage of python
in the examples/ and contrib/ directories.
- Python is also used in Bigtop build-time.
- The Python language is available for free on essentially all
platforms, under an Apache-compatible
license<http://www.apache.org/legal/resolved.html>.
- It is supported in Eclipse and similar IDEs.
- Most importantly, it is widely accepted as a reasonably good OO
scripting language, and it is easily learned by anyone who already
knows
Post by Alejandro Abdelnur
Post by Matt Foley
shell or perl, or other common scripting languages.
- On the Tiobe index of programming language
popularity<
http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html>,
which seeks to measure the relative number of software engineers who know
and use each language, Python far exceeds Perl and Ruby. The only
more
Post by Alejandro Abdelnur
Post by Matt Foley
well-known scripting languages are PHP and Visual Basic, neither of which
seems a prime candidate for this use.
For build-time usage, I think we should immediately approve python as a
build-time dependency, and allow people who are motivated to do so, to
open
Post by Alejandro Abdelnur
Post by Matt Foley
jiras for migrating existing build-time shell scripts to python.
For run-time, there is likely to be a lot more discussion. Lots of
folks,
Post by Alejandro Abdelnur
Post by Matt Foley
including me, aren't real happy with use of active scripts for
configuration, and various others, including I believe some of the
Bigtop
Post by Alejandro Abdelnur
Post by Matt Foley
folks, have issues with the way the start/stop scripts work.
Nevertheless,
Post by Alejandro Abdelnur
Post by Matt Foley
all those scripts exist today and are widely used. And they present an
impediment to porting to Windows-without-cygwin.
Nothing about run-time use of scripts has changed significantly over
the
Post by Alejandro Abdelnur
Post by Matt Foley
past three years, and I don't think we should hold up the Windows port
while we have a huge discussion about issues that veer dangerously into
religious/aesthetic domains. It would be fun to have that discussion,
but I
Post by Alejandro Abdelnur
Post by Matt Foley
don't want this decision to be dependent on it!
So I propose that we go ahead and also approve python as a run-time
dependency, and allow the inclusion of python scripts in place of
current
Post by Alejandro Abdelnur
Post by Matt Foley
shell-based functionality. The unpleasant alternative is to spawn a
bunch
Post by Alejandro Abdelnur
Post by Matt Foley
of powershell scripts in parallel to the current shell scripts, with a
very
Post by Alejandro Abdelnur
Post by Matt Foley
negative impact on maintainability. The Windows port must, after all,
be
Post by Alejandro Abdelnur
Post by Matt Foley
allowed to proceed.
Let's have a discussion, and then I'll put both issues, separately, to
a
Post by Alejandro Abdelnur
Post by Matt Foley
vote (unless we miraculously achieve consensus without a vote :-)
I also encourage members of the other Hadoop-related projects, to carry
this discussion into those forums. It would be very cool to agree on a
whole-stack solution for the scripting problem.
Best regards,
--Matt
--
Alejandro
Konstantin Boudnik
2012-11-21 21:50:45 UTC
Permalink
Ditto...
Post by Matt Foley
Cos,
Please see in-line.
Post by Konstantin Boudnik
- bringing in a scripting environment which is known for its
inter-version idiosyncrasies just because Windows can't handle trivial
shell scripting looks like an overkill to me
Excuse me? Can we at least try not to belittle other people's platforms on
a public Apache forum? There's nothing trivial about implementing shell on
Windows, as cygwin regrettably proved.
Belittle? Hardly ;) Because we all know very well why shell is so awkward to
implement on any Windows system.
Post by Matt Foley
Post by Konstantin Boudnik
- relative to above, there's a chance that Python's pre-requisites used
in Hadoop might get into a conflict with some other components in the
stack. This will be a nightmare for the integrator projects i.e. Bigtop
Said Bigtop project actually uses python, does it not?
It does, Matt. The main concern I have is at some point Hadoop's Python might
all of a sudden be of a different version than the one in BigTop. And all the
hell will break lose compatibility wise. What would be the solution then?
Post by Matt Foley
Post by Konstantin Boudnik
- Maven is de-facto standard for Java stacks
Sure -- except for when Ant was the de-facto standard for Java stacks. And
Arguable. Yet beyond the point.
Post by Matt Foley
Doing builds. Not scripting everything that needs scripting.
Arguable as well, due to the very definition of a build system.
Post by Matt Foley
Post by Konstantin Boudnik
- Maven has built-in scripting language (Groovy) if some plugins aren't
sufficient for achieving whatever goals
Are you proposing Groovy as a better scripting language than Python?
I am proposing Groovy is a better language than Python. Because, in part, it
goes far beyond scripting. And doesn't have permanent runtime backward
compatibility issues. What was the last time JDK had backward compatibility
problems?
Post by Matt Foley
Post by Konstantin Boudnik
Addressing Matt's later point about non-Mavenized Hadoop-1 line: it uses Maven
stuff suchs as deploy/install via custom ant tasks. Same approach would work
for saveVersion.sh and others, I am sure.
Current ant scripts in Hadoop seem to use maven only for artifact
management via the maven repository. If I'm missing something, please
point it out. The ant build task currently calls out to saveVersion.sh.
Having it call out to maven, which then calls out to a plug-in and/or a
Groovy script, doesn't sound like an improvement to me. And it's a way
At least it it guaranteed to work everywhere. And all we need in this case is
an extra jar file that can be pulled down through the same ivy/maven
dependency mechanism.

In case of Python you'd have to make sure that you're having the right version
of the interpreter and runtime. And you will have to do it manually or have an
extra requirement expressed via a system maintenance DSL.
Post by Matt Foley
different use of maven than currently in the Hadoop-1 line, not a
continuation of established practice.
The main point of my argument expressed in a lesser than 100 words: adding
Python that is inconsistent across different Linux distros and has a history
of backward incompatibilities (2.6 vs 2.5, 3.0 vs earlier, etc.) doesn't seem
to leverage the benefit of having a somewhat easier build in Windows.

Perhaps, we can do a more format benefit analysis by just comparing the
number of Hadoop installations on MS Win vs. Unix's.

Cos
Post by Matt Foley
Post by Konstantin Boudnik
Post by Alejandro Abdelnur
Hey Matt,
We already require java/mvn/protoc/cmake/forrest (forrest is hopefully on
its way out with the move of docs to APT)
Why not do a maven-plugin to do that?
Colin already has something to simplify all the cmake calls from the
builds
Post by Alejandro Abdelnur
using a maven-plugin (https://issues.apache.org/jira/browse/HADOOP-8887)
We could do the same with protoc, thus simplifying the POMs.
The saveVersion.sh seems like another prime candidate for a maven plugin,
and in this case it would not require external tools.
Does this make sense?
Thx
Post by Matt Foley
This discussion started in
HADOOP-8924<https://issues.apache.org/jira/browse/HADOOP-8924>
, where it was proposed to replace the build-time utility
"saveVersion.sh"
Post by Alejandro Abdelnur
Post by Matt Foley
with a python script. This would require Python as a build-time
Those of us involved in the branch-1-win port of Hadoop to Windows
without
Post by Alejandro Abdelnur
Post by Matt Foley
use of Cygwin, have faced the issue of frequent use of shell scripts
throughout the system, both in build time (eg, the utility "saveVersion.sh"),
and run time (config files like "hadoop-env.sh" and the start/stop
scripts
Post by Alejandro Abdelnur
Post by Matt Foley
in "bin/*" ). Similar usages exist throughout the Hadoop stack, in all
projects.
The vast majority of these shell scripts do not do anything platform
specific; they can be expressed in a posix-conforming way. Therefore,
it
Post by Alejandro Abdelnur
Post by Matt Foley
seems to us that it makes sense to start using a cross-platform
scripting
Post by Alejandro Abdelnur
Post by Matt Foley
language, such as python, in place of shell for these purposes. For
those
Post by Alejandro Abdelnur
Post by Matt Foley
rare occasions where platform-specific functionality really is needed,
python also supports quite a lot of platform-specific functionality on
both
Post by Alejandro Abdelnur
Post by Matt Foley
Linux and Windows; but where that is inadequate, one could still
conditionally invoke a platform-specific module written in shell (for
Linux/*nix) or powershell or bat (for Windows).
The primary motive for moving to a cross-platform scripting language is
maintainability. The alternative would be to maintain two complete
suites
Post by Alejandro Abdelnur
Post by Matt Foley
of scripts, one for Linux and one for Windows (and perhaps others in
the
Post by Alejandro Abdelnur
Post by Matt Foley
future). We want to avoid the need to update dual modules in two
different
Post by Alejandro Abdelnur
Post by Matt Foley
languages when functionality changes, especially given that many Linux
developers are not familiar with powershell or bat, and many Windows
developers are not familiar with shell or bash.
- There are already a few instances of python usage in Hadoop, such
as
Post by Alejandro Abdelnur
Post by Matt Foley
the utility (currently broken) "relnotes.py", and massive usage of python
in the examples/ and contrib/ directories.
- Python is also used in Bigtop build-time.
- The Python language is available for free on essentially all
platforms, under an Apache-compatible
license<http://www.apache.org/legal/resolved.html>.
- It is supported in Eclipse and similar IDEs.
- Most importantly, it is widely accepted as a reasonably good OO
scripting language, and it is easily learned by anyone who already
knows
Post by Alejandro Abdelnur
Post by Matt Foley
shell or perl, or other common scripting languages.
- On the Tiobe index of programming language
popularity<
http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html>,
which seeks to measure the relative number of software engineers who know
and use each language, Python far exceeds Perl and Ruby. The only
more
Post by Alejandro Abdelnur
Post by Matt Foley
well-known scripting languages are PHP and Visual Basic, neither of which
seems a prime candidate for this use.
For build-time usage, I think we should immediately approve python as a
build-time dependency, and allow people who are motivated to do so, to
open
Post by Alejandro Abdelnur
Post by Matt Foley
jiras for migrating existing build-time shell scripts to python.
For run-time, there is likely to be a lot more discussion. Lots of
folks,
Post by Alejandro Abdelnur
Post by Matt Foley
including me, aren't real happy with use of active scripts for
configuration, and various others, including I believe some of the
Bigtop
Post by Alejandro Abdelnur
Post by Matt Foley
folks, have issues with the way the start/stop scripts work.
Nevertheless,
Post by Alejandro Abdelnur
Post by Matt Foley
all those scripts exist today and are widely used. And they present an
impediment to porting to Windows-without-cygwin.
Nothing about run-time use of scripts has changed significantly over
the
Post by Alejandro Abdelnur
Post by Matt Foley
past three years, and I don't think we should hold up the Windows port
while we have a huge discussion about issues that veer dangerously into
religious/aesthetic domains. It would be fun to have that discussion,
but I
Post by Alejandro Abdelnur
Post by Matt Foley
don't want this decision to be dependent on it!
So I propose that we go ahead and also approve python as a run-time
dependency, and allow the inclusion of python scripts in place of
current
Post by Alejandro Abdelnur
Post by Matt Foley
shell-based functionality. The unpleasant alternative is to spawn a
bunch
Post by Alejandro Abdelnur
Post by Matt Foley
of powershell scripts in parallel to the current shell scripts, with a
very
Post by Alejandro Abdelnur
Post by Matt Foley
negative impact on maintainability. The Windows port must, after all,
be
Post by Alejandro Abdelnur
Post by Matt Foley
allowed to proceed.
Let's have a discussion, and then I'll put both issues, separately, to
a
Post by Alejandro Abdelnur
Post by Matt Foley
vote (unless we miraculously achieve consensus without a vote :-)
I also encourage members of the other Hadoop-related projects, to carry
this discussion into those forums. It would be very cool to agree on a
whole-stack solution for the scripting problem.
Best regards,
--Matt
--
Alejandro
Andy Isaacson
2012-11-21 23:00:06 UTC
Permalink
Post by Konstantin Boudnik
The main point of my argument expressed in a lesser than 100 words: adding
Python that is inconsistent across different Linux distros and has a history
of backward incompatibilities (2.6 vs 2.5, 3.0 vs earlier, etc.) doesn't seem
to leverage the benefit of having a somewhat easier build in Windows.
This seems mostly like a red herring to me. I'll grant that it's hard
to build a complete Python stack that's compatible between Python 2.x
and 2.y, but it's very easy by following best practices to keep python
*scripts* compatible across all reasonable Python 2.x versions.
Simply pick an oldest-supported-version like 2.4 and be reasonably
disciplined to not use newer constructs or libraries. I wouldn't wish
to try to build a complete system in such a limited dialect [1], but
for "we need a reasonable replacement for /bin/sh scripts" it's just
fine.

Ignore Python 3 for the time being, it's a completely different
language with incompatible syntax and semantics that doesn't support
several currently-important platforms. Maybe in a few years sane
people can consider moving to it, but for now it's best to just stick
with the compatible subset of Python 2.x.

[1] the Mercurial project has had a pretty good experience with this
scheme; http://mercurial.selenic.com/wiki/SupportedPythonVersions they
currently support 2.4 - 2.7 with a few required libraries. They
dropped 2.2 and 2.3 support a few years ago due to specific
shortcomings on those versions.

-andy
Radim Kolar
2012-11-21 23:58:41 UTC
Permalink
/Ignore Python 3 for the time being, it's a completely different
language with incompatible syntax and semantics that doesn't support
several currently-important platforms. Maybe in a few years sane people
can consider moving to it, but for now it's best to just stick with the
compatible subset of Python 2.x. [1] the Mercurial project has had a
pretty good experience with this scheme;
http://mercurial.selenic.com/wiki/SupportedPythonVersions they currently
support 2.4 - 2.7 with a few required libraries. They dropped 2.2 and
2.3 support a few years ago due to specific shortcomings on those versions./

I know that Python compatibility can be worked around. I used Python for
few years and wrote about 70k LOC in it until it started to irritate me
that every new version has incompatibilities such as 2.4 vs 2.3 vs 2.5
and it makes maintaining and testing way harder then it should be. Its
not just compatibility with missing library functions. sometimes even
expression evaluated to different value under new version. This was
similar to php 4 to php 5 migration. Today i have 3 versions of python
installed because of software requirements.

For simple scripts it can probably work if you stick to some common subset.

Scripting via maven plugin has advantage that user do not needs to
install anything, there is couple of languages available: scala, groovy,
jelly, jruby. Maybe jython too.
Konstantin Boudnik
2012-11-22 01:46:26 UTC
Permalink
Post by Radim Kolar
I know that Python compatibility can be worked around. I used Python
for few years and wrote about 70k LOC in it until it started to
irritate me that every new version has incompatibilities such as 2.4
vs 2.3 vs 2.5 and it makes maintaining and testing way harder then
it should be. Its not just compatibility with missing library
functions. sometimes even expression evaluated to different value
under new version. This was similar to php 4 to php 5 migration.
Today i have 3 versions of python installed because of software
requirements.
For simple scripts it can probably work if you stick to some common subset.
Scripting via maven plugin has advantage that user do not needs to
install anything, there is couple of languages available: scala,
groovy, jelly, jruby. Maybe jython too.
pretty much all of the j* in JSR223 land is abomination of one sort or
another, actually :)

Cos
Radim Kolar
2012-11-22 01:57:08 UTC
Permalink
Post by Konstantin Boudnik
pretty much all of the j* in JSR223 land is abomination of one sort or
another, actually :)
jruby is good because you can run rails application on standard Java
infrastructure which is way easier to maintain, then obscure Ruby
application servers.
Steve Loughran
2012-11-22 09:21:13 UTC
Permalink
Scripting via maven plugin has advantage that user do not needs to install
anything, there is couple of languages available: scala, groovy, jelly,
jruby. Maybe jython too.
the JSR-233 bridge comes with a javascript interpreter built in, BTW. You
can actually use it in ant's <script> and <scriptdef> tasks without even
having to stick a new Jar on the CP. That doesn't mean it's ideal.

There was recent discussion on bigtop dev about moving to a later version
of groovy; Roman found they ran into some problem where the maven groovy
code was reluctant to upgrade:

http://groovy.329449.n5.nabble.com/groovy-maven-td4382545.html#a4382976
Radim Kolar
2012-11-21 20:46:54 UTC
Permalink
Post by Alejandro Abdelnur
Why not do a maven-plugin to do that?
maven plugins are difficult to maintain. its better to use inline
scripts, with something like this:

http://docs.codehaus.org/display/GMAVEN/Home;jsessionid=E29093B96230BBB4461F02A1718A6B71
Konstantin Boudnik
2012-11-21 21:33:55 UTC
Permalink
Post by Radim Kolar
Post by Alejandro Abdelnur
Why not do a maven-plugin to do that?
maven plugins are difficult to maintain. its better to use inline
http://docs.codehaus.org/display/GMAVEN/Home;jsessionid=E29093B96230BBB4461F02A1718A6B71
Exactly my point, thank you!

Cos
Steve Loughran
2012-11-22 09:14:19 UTC
Permalink
Post by Matt Foley
This discussion started in
Those of us involved in the branch-1-win port of Hadoop to Windows without
use of Cygwin, have faced the issue of frequent use of shell scripts
throughout the system, both in build time (eg, the utility
"saveVersion.sh"),
and run time (config files like "hadoop-env.sh" and the start/stop scripts
in "bin/*" ). Similar usages exist throughout the Hadoop stack, in all
projects.
The vast majority of these shell scripts do not do anything platform
specific; they can be expressed in a posix-conforming way. Therefore, it
seems to us that it makes sense to start using a cross-platform scripting
language, such as python, in place of shell for these purposes. For those
rare occasions where platform-specific functionality really is needed,
python also supports quite a lot of platform-specific functionality on both
Linux and Windows; but where that is inadequate, one could still
conditionally invoke a platform-specific module written in shell (for
Linux/*nix) or powershell or bat (for Windows).
The primary motive for moving to a cross-platform scripting language is
maintainability. The alternative would be to maintain two complete suites
of scripts, one for Linux and one for Windows (and perhaps others in the
future). We want to avoid the need to update dual modules in two different
languages when functionality changes, especially given that many Linux
developers are not familiar with powershell or bat, and many Windows
developers are not familiar with shell or bash.
I'd argue that a lot of Hadoop java developers aren't that familiar with
bash. It's only in the last six months that I've come to hate it properly.

In the ant project, it was the launcher scripts that had the worst
bugrep:line ratio, as
-variations in .sh behaviour, especially under cygwin, but also things
that weren't bash (AIX, ...)
-requirements of the entire unix command set for real work
-variants in the parameters/behaviour of those commands between Linux and
other widely used Unix systems (e.g. OSX)
-lack of inclusion of the .sh scripts in the junit test suite
-lack of understanding of bash.

In the ant project we added a Python launcher in, what, 2001, based on the
Post by Matt Foley
For run-time, there is likely to be a lot more discussion. Lots of folks,
including me, aren't real happy with use of active scripts for
configuration, and various others, including I believe some of the Bigtop
folks, have issues with the way the start/stop scripts work. Nevertheless,
all those scripts exist today and are widely used. And they present an
impediment to porting to Windows-without-cygwin.
They're a maintenance and support cost on Unix. Too many scripts, even more
in Yarn, weakly-nondeterministic logic for loading env variables,
especially between init.d and bin/hadoop; not much diagnostics. And as with
Ant, a relatively under-comprehended language with no unit test coverage.

I'd replace the bash logic with python for Unix dev and maintenance alone.
You could put your logic into a shared python module in usr/lib/hadoop/bin
, have PyUnit test the inner functions as part of the build and test
process (& jenkins).
Post by Matt Foley
Nothing about run-time use of scripts has changed significantly over the
past three years, and I don't think we should hold up the Windows port
while we have a huge discussion about issues that veer dangerously into
religious/aesthetic domains. It would be fun to have that discussion, but I
don't want this decision to be dependent on it!
With Yarn its got more complex. More env variables to set, more support
calls when they aren't.
Post by Matt Foley
So I propose that we go ahead and also approve python as a run-time
dependency, and allow the inclusion of python scripts in place of current
shell-based functionality. The unpleasant alternative is to spawn a bunch
of powershell scripts in parallel to the current shell scripts, with a very
negative impact on maintainability. The Windows port must, after all, be
allowed to proceed.
+1 to any vote to allow .py at run time as a new feature

=0 to ripping out and replacing the existing .sh scripts with python code,
as even though I don't like the scripts, replacing them could be traumatic
downstream.

+1 to a gradual migration to .py for new code, starting with the yarn
scripts.
Radim Kolar
2012-11-23 23:40:56 UTC
Permalink
discussion seems to ended, lets start vote.
Matt Foley
2012-11-24 20:13:18 UTC
Permalink
Please see new [VOTE] thread.
Post by Radim Kolar
discussion seems to ended, lets start vote.
Radim Kolar
2012-11-24 21:26:12 UTC
Permalink
we have not discussed advantages of stand alone python vs
jython-in-maven pom

http://code.google.com/p/jy-maven-plugin/

language is about same, and it does not needs to have installed, which
is advantage on windows.
Konstantin Boudnik
2012-11-24 22:03:23 UTC
Permalink
If we decide to go with Maven then there's no point to complicate the
picture with jython. This time I will keep the offensive about *yton to myself
;)

Cos
Post by Radim Kolar
we have not discussed advantages of stand alone python vs
jython-in-maven pom
http://code.google.com/p/jy-maven-plugin/
language is about same, and it does not needs to have installed,
which is advantage on windows.
Matt Foley
2012-11-24 20:13:06 UTC
Permalink
For discussion, please see previous thread "[PROPOSAL] introduce Python as
build-time and run-time dependency for Hadoop and throughout Hadoop stack".

This vote consists of three separate items:

1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.

2. Contributors shall be encouraged to use Maven tasks in combination with
either plug-ins or Groovy scripts to do cross-platform build-time tasks,
even under ant in Hadoop-1.
Please vote +1, 0, -1.

3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.

Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
use Maven plug-ins or Groovy as the only means of cross-platform build-time
tasks, or to simply continue using platform-dependent scripts as is being
done today.

Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.

Best regards,
--Matt
Chris Nauroth
2012-11-25 07:18:15 UTC
Permalink
+1, +1, +1 (non-binding)
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce Python as
build-time and run-time dependency for Hadoop and throughout Hadoop stack".
1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in combination with
either plug-ins or Groovy scripts to do cross-platform build-time tasks,
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
use Maven plug-ins or Groovy as the only means of cross-platform build-time
tasks, or to simply continue using platform-dependent scripts as is being
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.
Best regards,
--Matt
Steve Loughran
2012-11-25 12:39:02 UTC
Permalink
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce Python as
build-time and run-time dependency for Hadoop and throughout Hadoop stack".
1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.
+1
Post by Matt Foley
2. Contributors shall be encouraged to use Maven tasks in combination with
either plug-ins or Groovy scripts to do cross-platform build-time tasks,
even under ant in Hadoop-1.
Please vote +1, 0, -1.
+1

My feelings on Maven are well known, but Groovy can mitigate things. And
I'm not going to advocate post-M2 build tools such as Gradle.

It's ironic that Maven's utter inflexibility forces people to use scripting
languages to get their work done, but Groovy is fairly nimble here -and
easy to learn for any Java programmer. "Groovy in Action" is the book to
own.
Post by Matt Foley
3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
+1. I look forward to never having to debug shell script env variable
inheritance ever again.

This does not mean that I advocate writing big bits of the system in .py;
as someone who is debugging OpenStack request throttling this weekend, I
know that Python is not "the solution" to problems. For Hadoop it has a
role, but the role should be ('better than bash') and ('streaming
integration').
Post by Matt Foley
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
use Maven plug-ins or Groovy as the only means of cross-platform build-time
tasks, or to simply continue using platform-dependent scripts as is being
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.
Best regards,
--Matt
Robert Evans
2012-11-26 16:16:23 UTC
Permalink
+1, +1, 0
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce Python as
build-time and run-time dependency for Hadoop and throughout Hadoop stack".
1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in combination with
either plug-ins or Groovy scripts to do cross-platform build-time tasks,
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
use Maven plug-ins or Groovy as the only means of cross-platform build-time
tasks, or to simply continue using platform-dependent scripts as is being
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.
Best regards,
--Matt
Adam Berry
2012-11-26 16:45:56 UTC
Permalink
0, +1, -1 (non-binding)

Also, it feels like maybe the discussion should have been kept open a little longer, thanksgiving holidays last week meant that people may have missed it.

Cheers,
Adam
Post by Robert Evans
+1, +1, 0
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce Python as
build-time and run-time dependency for Hadoop and throughout Hadoop stack".
1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in combination with
either plug-ins or Groovy scripts to do cross-platform build-time tasks,
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
use Maven plug-ins or Groovy as the only means of cross-platform build-time
tasks, or to simply continue using platform-dependent scripts as is being
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.
Best regards,
--Matt
Colin McCabe
2012-11-26 16:53:38 UTC
Permalink
Nonbinding, but:

+1, +1, 0.

Also, let's please clearly define the versions of Python we support if
we do chooes to go this route. Something like 2.4+ would be
reasonable. The process launching APIs in particular changed a lot in
those early 2.x releases.

best,
Colin
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce Python as
build-time and run-time dependency for Hadoop and throughout Hadoop stack".
1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in combination with
either plug-ins or Groovy scripts to do cross-platform build-time tasks,
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
use Maven plug-ins or Groovy as the only means of cross-platform build-time
tasks, or to simply continue using platform-dependent scripts as is being
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.
Best regards,
--Matt
Luke Lu
2012-11-26 17:25:18 UTC
Permalink
-1, +1, -1.

If we want to introduce a "platform independent" scripting language, we
should not choose python, as it has a bad track record for compatibility
(between versions/platforms).

+1 to use groovy, as we can control the version of groovy jars included in
our distribution.

__Luke
Post by Colin McCabe
+1, +1, 0.
Also, let's please clearly define the versions of Python we support if
we do chooes to go this route. Something like 2.4+ would be
reasonable. The process launching APIs in particular changed a lot in
those early 2.x releases.
best,
Colin
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce Python
as
Post by Matt Foley
build-time and run-time dependency for Hadoop and throughout Hadoop
stack".
Post by Matt Foley
1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in combination
with
Post by Matt Foley
either plug-ins or Groovy scripts to do cross-platform build-time tasks,
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors
to
Post by Matt Foley
use Maven plug-ins or Groovy as the only means of cross-platform
build-time
Post by Matt Foley
tasks, or to simply continue using platform-dependent scripts as is being
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.
Best regards,
--Matt
Chris Nauroth
2012-11-26 17:44:29 UTC
Permalink
Declaring 2.4 to be the minimum supported version sounds like a great idea.
I've worked with CentOS distributions that have a dependency on Python
2.4, and it was always awkward to get a later version on those machines.

Thank you,
--Chris
Post by Colin McCabe
+1, +1, 0.
Also, let's please clearly define the versions of Python we support if
we do chooes to go this route. Something like 2.4+ would be
reasonable. The process launching APIs in particular changed a lot in
those early 2.x releases.
best,
Colin
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce Python
as
Post by Matt Foley
build-time and run-time dependency for Hadoop and throughout Hadoop
stack".
Post by Matt Foley
1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in combination
with
Post by Matt Foley
either plug-ins or Groovy scripts to do cross-platform build-time tasks,
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors
to
Post by Matt Foley
use Maven plug-ins or Groovy as the only means of cross-platform
build-time
Post by Matt Foley
tasks, or to simply continue using platform-dependent scripts as is being
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.
Best regards,
--Matt
Radim Kolar
2012-11-26 17:34:12 UTC
Permalink
-1, +1, -1
Konstantin Boudnik
2012-11-26 18:30:33 UTC
Permalink
-1, +1, -1

Thanks
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce Python as
build-time and run-time dependency for Hadoop and throughout Hadoop stack".
1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in combination with
either plug-ins or Groovy scripts to do cross-platform build-time tasks,
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
use Maven plug-ins or Groovy as the only means of cross-platform build-time
tasks, or to simply continue using platform-dependent scripts as is being
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.
Best regards,
--Matt
Suresh Srinivas
2012-11-26 20:41:56 UTC
Permalink
+1, +1, +1

Regards,
Suresh
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce Python as
build-time and run-time dependency for Hadoop and throughout Hadoop stack".
1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in combination with
either plug-ins or Groovy scripts to do cross-platform build-time tasks,
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
use Maven plug-ins or Groovy as the only means of cross-platform build-time
tasks, or to simply continue using platform-dependent scripts as is being
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.
Best regards,
--Matt
--
http://hortonworks.com/download/
Giridharan Kesavan
2012-11-26 21:16:09 UTC
Permalink
+1, +1, +1

-Giri
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce Python as
build-time and run-time dependency for Hadoop and throughout Hadoop stack".
1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in combination with
either plug-ins or Groovy scripts to do cross-platform build-time tasks,
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
use Maven plug-ins or Groovy as the only means of cross-platform build-time
tasks, or to simply continue using platform-dependent scripts as is being
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.
Best regards,
--Matt
Alejandro Abdelnur
2012-11-26 21:52:45 UTC
Permalink
Matt,

The scope of this vote seems different from what was discussed in the
PROPOSAL thread.

In the PROPOSAL thread you indicated this was for Hadoop1 because it is ANT
based. And the main reason was to remove saveVersion.sh.

Your #3 was not discussed in the proposal, was it?

It seems this vote is dragging much more stuff it was originally discussed.
I think you should suspend the vote, recap the motivation and then restart
the vote. As things are laid out at the moment my vote is:

-1 (It still seems an overkill to introduce a new runtime requirement for
building to replace a script.)
+1 (I think this is the right way to simplify the build)
-1 (AFAIK there is not such requirement at the moment, and if it comes it
would be in the form of an AM, which I'd argue it should leave outside of
Hadoop)

Thx


On Mon, Nov 26, 2012 at 1:16 PM, Giridharan Kesavan <
Post by Matt Foley
+1, +1, +1
-Giri
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce Python
as
Post by Matt Foley
build-time and run-time dependency for Hadoop and throughout Hadoop
stack".
Post by Matt Foley
1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in combination
with
Post by Matt Foley
either plug-ins or Groovy scripts to do cross-platform build-time tasks,
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors
to
Post by Matt Foley
use Maven plug-ins or Groovy as the only means of cross-platform
build-time
Post by Matt Foley
tasks, or to simply continue using platform-dependent scripts as is being
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.
Best regards,
--Matt
--
Alejandro
Radim Kolar
2012-11-26 22:17:40 UTC
Permalink
Post by Alejandro Abdelnur
In the PROPOSAL thread you indicated this was for Hadoop1 because it is ANT
based. And the main reason was to remove saveVersion.sh.
Your #3 was not discussed in the proposal, was it?
it was part of original proposal but not discussed much because language
war was more attractive option. You want vote like this?

1. Using external language vs maven plugin to build
2. Using external language for startup scripts vs JVM script language.
Such as Jython use in websphere.
3. Choose python as external language
Matt Foley
2012-11-29 22:39:15 UTC
Permalink
Hi Alejandro,
Please see in-line below.
Post by Alejandro Abdelnur
Matt,
The scope of this vote seems different from what was discussed in the
PROPOSAL thread.
In the PROPOSAL thread you indicated this was for Hadoop1 because it is ANT
based. And the main reason was to remove saveVersion.sh.
Your #3 was not discussed in the proposal, was it?
The item #3 was in my original statement of the problem, with which I
started the proposal thread. In fact, the thread title was "[PROPOSAL]
introduce Python as build-time and run-time dependency for Hadoop and
throughout Hadoop stack". It is true that only one or two people chose to
discuss #3 further in that thread.

The point is not just to replace a single script, but to provide a means to
do cross-platform scripts, which will over time replace many
non-platform-specific scripts written in platform-specific languages.
Post by Alejandro Abdelnur
It seems this vote is dragging much more stuff it was originally discussed.
I think you should suspend the vote, recap the motivation and then restart
the vote.
I respectfully disagree. I believe a careful reading of the cited
discussion thread, plus my own statement of the vote, provides sufficient
background for a thoughtful decision on the subject. Presumably so do the
ten other people who had already voted before you made that comment.

If several other people want more discussion first, please speak up.
Thanks,
--Matt
Post by Alejandro Abdelnur
-1 (It still seems an overkill to introduce a new runtime requirement for
building to replace a script.)
+1 (I think this is the right way to simplify the build)
-1 (AFAIK there is not such requirement at the moment, and if it comes it
would be in the form of an AM, which I'd argue it should leave outside of
Hadoop)
Thx
On Mon, Nov 26, 2012 at 1:16 PM, Giridharan Kesavan <
Post by Matt Foley
+1, +1, +1
-Giri
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce Python
as
Post by Matt Foley
build-time and run-time dependency for Hadoop and throughout Hadoop
stack".
Post by Matt Foley
1. Contributors shall be allowed to use Python as a
platform-independent
Post by Matt Foley
Post by Matt Foley
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in combination
with
Post by Matt Foley
either plug-ins or Groovy scripts to do cross-platform build-time
tasks,
Post by Matt Foley
Post by Matt Foley
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a
platform-independent
Post by Matt Foley
Post by Matt Foley
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES
contributors
Post by Matt Foley
to
Post by Matt Foley
use Maven plug-ins or Groovy as the only means of cross-platform
build-time
Post by Matt Foley
tasks, or to simply continue using platform-dependent scripts as is
being
Post by Matt Foley
Post by Matt Foley
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to
cross-platform
Post by Matt Foley
Post by Matt Foley
scripts for build-time tasks.
Best regards,
--Matt
--
Alejandro
Alejandro Abdelnur
2012-11-29 23:26:34 UTC
Permalink
Matt, thanks for the clarification.

I may have missed the main point of the PROPOSAL thread then. I personally
want to continue the discussion before voting.

* Phyton as runtime requirement. Are you planing to migrate all BASH
scripts provided by Hadoop (or dynamically created -ie launcher scripts)
to Phyton?
* What else in the current build, besides saveVersion.sh, you see as
candidate to be migrated to Phyton?
* How are you planning to define what Phyton modules can be used? Will
developers have to install them manually?

Cheers
Post by Matt Foley
Hi Alejandro,
Please see in-line below.
Post by Alejandro Abdelnur
Matt,
The scope of this vote seems different from what was discussed in the
PROPOSAL thread.
In the PROPOSAL thread you indicated this was for Hadoop1 because it is
ANT
Post by Alejandro Abdelnur
based. And the main reason was to remove saveVersion.sh.
Your #3 was not discussed in the proposal, was it?
The item #3 was in my original statement of the problem, with which I
started the proposal thread. In fact, the thread title was "[PROPOSAL]
introduce Python as build-time and run-time dependency for Hadoop and
throughout Hadoop stack". It is true that only one or two people chose to
discuss #3 further in that thread.
The point is not just to replace a single script, but to provide a means to
do cross-platform scripts, which will over time replace many
non-platform-specific scripts written in platform-specific languages.
Post by Alejandro Abdelnur
It seems this vote is dragging much more stuff it was originally
discussed.
Post by Alejandro Abdelnur
I think you should suspend the vote, recap the motivation and then
restart
Post by Alejandro Abdelnur
the vote.
I respectfully disagree. I believe a careful reading of the cited
discussion thread, plus my own statement of the vote, provides sufficient
background for a thoughtful decision on the subject. Presumably so do the
ten other people who had already voted before you made that comment.
If several other people want more discussion first, please speak up.
Thanks,
--Matt
Post by Alejandro Abdelnur
-1 (It still seems an overkill to introduce a new runtime requirement for
building to replace a script.)
+1 (I think this is the right way to simplify the build)
-1 (AFAIK there is not such requirement at the moment, and if it comes it
would be in the form of an AM, which I'd argue it should leave outside of
Hadoop)
Thx
On Mon, Nov 26, 2012 at 1:16 PM, Giridharan Kesavan <
Post by Matt Foley
+1, +1, +1
-Giri
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce
Python
Post by Alejandro Abdelnur
Post by Matt Foley
as
Post by Matt Foley
build-time and run-time dependency for Hadoop and throughout Hadoop
stack".
Post by Matt Foley
1. Contributors shall be allowed to use Python as a
platform-independent
Post by Matt Foley
Post by Matt Foley
scripting language for build-time tasks, and add Python as a
build-time
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Matt Foley
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in combination
with
Post by Matt Foley
either plug-ins or Groovy scripts to do cross-platform build-time
tasks,
Post by Matt Foley
Post by Matt Foley
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a
platform-independent
Post by Matt Foley
Post by Matt Foley
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES
contributors
Post by Matt Foley
to
Post by Matt Foley
use Maven plug-ins or Groovy as the only means of cross-platform
build-time
Post by Matt Foley
tasks, or to simply continue using platform-dependent scripts as is
being
Post by Matt Foley
Post by Matt Foley
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it,
and
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Matt Foley
until those are worked out I don't want to delay moving to
cross-platform
Post by Matt Foley
Post by Matt Foley
scripts for build-time tasks.
Best regards,
--Matt
--
Alejandro
--
Alejandro
Radim Kolar
2012-11-30 00:29:11 UTC
Permalink
* What else in the current build, besides saveVersion.sh, you see as
candidate to be migrated to Phyton?

inline ant scripts
Steve Loughran
2012-11-30 13:20:37 UTC
Permalink
Post by Alejandro Abdelnur
* What else in the current build, besides saveVersion.sh, you see as
candidate to be migrated to Phyton?
inline ant scripts
=0. Ant's versioning is stricter; you can pull down the exact Jar versions,
and some of us in the Ant team worked very hard to get it going everywhere.
You don't gain anything by going to .py

-steve
Radim Kolar
2012-11-30 13:40:35 UTC
Permalink
Post by Radim Kolar
inline ant scripts
=0. Ant's versioning is stricter; you can pull down the exact Jar versions,
and some of us in the Ant team worked very hard to get it going everywhere.
You don't gain anything by going to .py
there are sh scripts inside maven ant plugin stuff
Jitendra Pandey
2012-11-30 22:49:53 UTC
Permalink
+1, +1, +1
Post by Radim Kolar
inline ant scripts
Post by Radim Kolar
Post by Steve Loughran
=0. Ant's versioning is stricter; you can pull down the exact Jar versions,
and some of us in the Ant team worked very hard to get it going everywhere.
You don't gain anything by going to .py
there are sh scripts inside maven ant plugin stuff
--
<http://hortonworks.com/download/>
Steve Loughran
2012-12-01 10:48:23 UTC
Permalink
Post by Radim Kolar
inline ant scripts
Post by Radim Kolar
Post by Steve Loughran
=0. Ant's versioning is stricter; you can pull down the exact Jar versions,
and some of us in the Ant team worked very hard to get it going everywhere.
You don't gain anything by going to .py
there are sh scripts inside maven ant plugin stuff
Which is because there are some things you can't do in Java -run rpmbuild
to pick up file permissions and hanging symlinks that only become valid on
deployment.

The reason Ant is used to start them is Maven views trying to run native
scripts as a forbidden action - probably popping up some patronising text
"you are trying to run a shell script, please look at
maven.apache.org/wiki/whymavenwontletyoudothings/ to understand this; they
also view building RPMs as not something to encourage either.

(but we digress into an ant vs maven argument. I do actually appreciate the
consistent target naming across projects and the ability for the IDE to set
up structure, it's just the entire underlying architecture and
implementation that I dislike)
Alejandro Abdelnur
2012-11-30 01:25:25 UTC
Permalink
Matt,

Let me repost my previous questions and a few more. I'd appreciate your
answers, as it will help me understand the full impact this would have in
Hadoop and related projects.

* Phyton as runtime requirement. Are you planing to migrate all BASH
scripts provided by Hadoop (or dynamically created -ie launcher scripts)
to Phyton?
* What else in the current build, besides saveVersion.sh, you see as
candidate to be migrated to Phyton?
* How are you planning to define what Phyton modules can be used? Will
developers have to install them manually?
* What kind of tasks you envision Python scripts will enable that are not
possible today?
* Will the requirement of Python be pushed to clients using the hadoop
script? If so, this would affect all downstream projects that use hadoop
script in one why or the other, right?

Is the main motivation of the proposal to make things easier for window, so
there is no need for cygwin? If that is the case, have you considered doing
directly BAT scripts? If you take Tomcat for example, they have BAT scripts
and SH scripts and things work quite nicely.

Personally, I wouldn't be trilled to see the logic in the scripts to get
more complex, but on the opposite direction; IMO, scripts should be trimmed
to set env vars (with no voodoo logic), build the classpath (with no voodoo
logic, just from a set of dirs) and call Java.

Finally, this is code change, so I'm not sure why we are doing a vote.

Thx.
Post by Alejandro Abdelnur
Matt, thanks for the clarification.
I may have missed the main point of the PROPOSAL thread then. I personally
want to continue the discussion before voting.
* Phyton as runtime requirement. Are you planing to migrate all BASH
scripts provided by Hadoop (or dynamically created -ie launcher scripts)
to Phyton?
* What else in the current build, besides saveVersion.sh, you see as
candidate to be migrated to Phyton?
* How are you planning to define what Phyton modules can be used? Will
developers have to install them manually?
Cheers
Post by Matt Foley
Hi Alejandro,
Please see in-line below.
Post by Alejandro Abdelnur
Matt,
The scope of this vote seems different from what was discussed in the
PROPOSAL thread.
In the PROPOSAL thread you indicated this was for Hadoop1 because it is
ANT
Post by Alejandro Abdelnur
based. And the main reason was to remove saveVersion.sh.
Your #3 was not discussed in the proposal, was it?
The item #3 was in my original statement of the problem, with which I
started the proposal thread. In fact, the thread title was "[PROPOSAL]
introduce Python as build-time and run-time dependency for Hadoop and
throughout Hadoop stack". It is true that only one or two people chose to
discuss #3 further in that thread.
The point is not just to replace a single script, but to provide a means to
do cross-platform scripts, which will over time replace many
non-platform-specific scripts written in platform-specific languages.
Post by Alejandro Abdelnur
It seems this vote is dragging much more stuff it was originally
discussed.
Post by Alejandro Abdelnur
I think you should suspend the vote, recap the motivation and then
restart
Post by Alejandro Abdelnur
the vote.
I respectfully disagree. I believe a careful reading of the cited
discussion thread, plus my own statement of the vote, provides sufficient
background for a thoughtful decision on the subject. Presumably so do the
ten other people who had already voted before you made that comment.
If several other people want more discussion first, please speak up.
Thanks,
--Matt
Post by Alejandro Abdelnur
-1 (It still seems an overkill to introduce a new runtime requirement
for
Post by Alejandro Abdelnur
building to replace a script.)
+1 (I think this is the right way to simplify the build)
-1 (AFAIK there is not such requirement at the moment, and if it comes
it
Post by Alejandro Abdelnur
would be in the form of an AM, which I'd argue it should leave outside
of
Post by Alejandro Abdelnur
Hadoop)
Thx
On Mon, Nov 26, 2012 at 1:16 PM, Giridharan Kesavan <
Post by Matt Foley
+1, +1, +1
-Giri
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce
Python
Post by Alejandro Abdelnur
Post by Matt Foley
as
Post by Matt Foley
build-time and run-time dependency for Hadoop and throughout Hadoop
stack".
Post by Matt Foley
1. Contributors shall be allowed to use Python as a
platform-independent
Post by Matt Foley
Post by Matt Foley
scripting language for build-time tasks, and add Python as a
build-time
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Matt Foley
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in
combination
Post by Alejandro Abdelnur
Post by Matt Foley
with
Post by Matt Foley
either plug-ins or Groovy scripts to do cross-platform build-time
tasks,
Post by Matt Foley
Post by Matt Foley
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a
platform-independent
Post by Matt Foley
Post by Matt Foley
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES
contributors
Post by Matt Foley
to
Post by Matt Foley
use Maven plug-ins or Groovy as the only means of cross-platform
build-time
Post by Matt Foley
tasks, or to simply continue using platform-dependent scripts as is
being
Post by Matt Foley
Post by Matt Foley
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it,
and
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Matt Foley
until those are worked out I don't want to delay moving to
cross-platform
Post by Matt Foley
Post by Matt Foley
scripts for build-time tasks.
Best regards,
--Matt
--
Alejandro
--
Alejandro
--
Alejandro
Matt Foley
2012-11-30 02:26:47 UTC
Permalink
Hello again. Crossed in the mail.

* What kind of tasks you envision Python scripts will enable that are
not possible today?
The point isn't to open brave new worlds. The point is to avoid the
nightmare of having to maintain multiple "parallel" scripts doing the SAME
THING in multiple scripting languages. I know from experience that they
never get maintained right. It's just a huge source of bugs, because when
they are in different languages, it can be quite difficult to determine
that they are *really* doing the same thing. And in a case like shell vs
powershell, it will be very common to have contributors who are not experts
in both.

I care deeply about having a high-quality release in both Linux and
Windows. And having a cross-platform scripting language will make it much
easier to maintain that quality over time, without "slip" between the two
platforms.

* Will the requirement of Python be pushed to clients using the
hadoop script? If so, this would affect all downstream projects that use
hadoop script in one why or the other, right?
If question #3 passes, then Python will become a run-time dependency for
Hadoop. That means it would need to be installed as part of the Hadoop
install preparation, just like all the other Hadoop run-time dependencies.

Is the main motivation of the proposal to make things easier for window,
so there is no need for cygwin? If that is the case, have you considered
doing directly BAT scripts? If you take Tomcat for example, they have BAT
scripts and SH scripts and things work quite nicely.
Of course it is sufficient, from the simple implementation perspective, to
translate all the shell scripts into bat or (better) powershell scripts.
That is, in fact, the most evident alternative to my proposals #1 and #3.

However, I ask -- beg! -- the community to consider it from the software
engineering perspective. We aren't here to just implement something once
and be done. It has to be maintained, as most of you on this list are well
aware, for years and years, across multiple generations. And trying to
maintain parallel scripts in multiple languages, when not necessitated by
genuine platform-specific requirements, is just creating bug generators in
the system.

Personally, I wouldn't be trilled to see the logic in the scripts to
get more complex, but on the opposite direction; IMO, scripts should be
trimmed to set env vars (with no voodoo logic), build the classpath (with
no voodoo logic, just from a set of dirs) and call Java.
See the first item above. The point is to enable cross-platform scripting
of the things we already have to script. IMO, scripts should get out of
the env var business entirely, but that's unrelated to this question :-)

Finally, this is code change, so I'm not sure why we are doing a vote.


I view this as a tools issue, that affects questions that go beyond the
one-time choice of how to write (or re-write) saveVersion.sh. Also Aaron
(atm) recommended that I bring it to the list. So here we are :-)

Cheers,
--Matt
Matt,
Let me repost my previous questions and a few more. I'd appreciate your
answers, as it will help me understand the full impact this would have in
Hadoop and related projects.
* Phyton as runtime requirement. Are you planing to migrate all BASH
scripts provided by Hadoop (or dynamically created -ie launcher scripts)
to Phyton?
* What else in the current build, besides saveVersion.sh, you see as
candidate to be migrated to Phyton?
* How are you planning to define what Phyton modules can be used? Will
developers have to install them manually?
* What kind of tasks you envision Python scripts will enable that are not
possible today?
* Will the requirement of Python be pushed to clients using the hadoop
script? If so, this would affect all downstream projects that use hadoop
script in one why or the other, right?
Is the main motivation of the proposal to make things easier for window, so
there is no need for cygwin? If that is the case, have you considered doing
directly BAT scripts? If you take Tomcat for example, they have BAT scripts
and SH scripts and things work quite nicely.
Personally, I wouldn't be trilled to see the logic in the scripts to get
more complex, but on the opposite direction; IMO, scripts should be trimmed
to set env vars (with no voodoo logic), build the classpath (with no voodoo
logic, just from a set of dirs) and call Java.
Finally, this is code change, so I'm not sure why we are doing a vote.
Thx.
Post by Alejandro Abdelnur
Matt, thanks for the clarification.
I may have missed the main point of the PROPOSAL thread then. I
personally
Post by Alejandro Abdelnur
want to continue the discussion before voting.
* Phyton as runtime requirement. Are you planing to migrate all BASH
scripts provided by Hadoop (or dynamically created -ie launcher scripts)
to Phyton?
* What else in the current build, besides saveVersion.sh, you see as
candidate to be migrated to Phyton?
* How are you planning to define what Phyton modules can be used? Will
developers have to install them manually?
Cheers
Post by Matt Foley
Hi Alejandro,
Please see in-line below.
Post by Alejandro Abdelnur
Matt,
The scope of this vote seems different from what was discussed in the
PROPOSAL thread.
In the PROPOSAL thread you indicated this was for Hadoop1 because it
is
Post by Alejandro Abdelnur
Post by Matt Foley
ANT
Post by Alejandro Abdelnur
based. And the main reason was to remove saveVersion.sh.
Your #3 was not discussed in the proposal, was it?
The item #3 was in my original statement of the problem, with which I
started the proposal thread. In fact, the thread title was "[PROPOSAL]
introduce Python as build-time and run-time dependency for Hadoop and
throughout Hadoop stack". It is true that only one or two people chose
to
Post by Alejandro Abdelnur
Post by Matt Foley
discuss #3 further in that thread.
The point is not just to replace a single script, but to provide a means to
do cross-platform scripts, which will over time replace many
non-platform-specific scripts written in platform-specific languages.
Post by Alejandro Abdelnur
It seems this vote is dragging much more stuff it was originally
discussed.
Post by Alejandro Abdelnur
I think you should suspend the vote, recap the motivation and then
restart
Post by Alejandro Abdelnur
the vote.
I respectfully disagree. I believe a careful reading of the cited
discussion thread, plus my own statement of the vote, provides
sufficient
Post by Alejandro Abdelnur
Post by Matt Foley
background for a thoughtful decision on the subject. Presumably so do
the
Post by Alejandro Abdelnur
Post by Matt Foley
ten other people who had already voted before you made that comment.
If several other people want more discussion first, please speak up.
Thanks,
--Matt
Post by Alejandro Abdelnur
-1 (It still seems an overkill to introduce a new runtime requirement
for
Post by Alejandro Abdelnur
building to replace a script.)
+1 (I think this is the right way to simplify the build)
-1 (AFAIK there is not such requirement at the moment, and if it comes
it
Post by Alejandro Abdelnur
would be in the form of an AM, which I'd argue it should leave outside
of
Post by Alejandro Abdelnur
Hadoop)
Thx
On Mon, Nov 26, 2012 at 1:16 PM, Giridharan Kesavan <
Post by Matt Foley
+1, +1, +1
-Giri
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce
Python
Post by Alejandro Abdelnur
Post by Matt Foley
as
Post by Matt Foley
build-time and run-time dependency for Hadoop and throughout
Hadoop
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Alejandro Abdelnur
Post by Matt Foley
stack".
Post by Matt Foley
1. Contributors shall be allowed to use Python as a
platform-independent
Post by Matt Foley
Post by Matt Foley
scripting language for build-time tasks, and add Python as a
build-time
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Matt Foley
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in
combination
Post by Alejandro Abdelnur
Post by Matt Foley
with
Post by Matt Foley
either plug-ins or Groovy scripts to do cross-platform build-time
tasks,
Post by Matt Foley
Post by Matt Foley
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a
platform-independent
Post by Matt Foley
Post by Matt Foley
scripting language for run-time tasks, and add Python as a
run-time
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Matt Foley
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES
contributors
Post by Matt Foley
to
Post by Matt Foley
use Maven plug-ins or Groovy as the only means of cross-platform
build-time
Post by Matt Foley
tasks, or to simply continue using platform-dependent scripts as
is
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Alejandro Abdelnur
being
Post by Matt Foley
Post by Matt Foley
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it,
and
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Matt Foley
until those are worked out I don't want to delay moving to
cross-platform
Post by Matt Foley
Post by Matt Foley
scripts for build-time tasks.
Best regards,
--Matt
--
Alejandro
--
Alejandro
--
Alejandro
Chuan Liu
2012-11-30 03:22:52 UTC
Permalink
+1 +1 +1

Agree with Matt on the code maintainability.

I think on one side we have Shell which is a script language and OS dependent, e.g. as in bash vs powershell;
on the other side we have Java which is not a script language and OS independent.
I would accept any script language that can fix the gap as an OS independent scripting language.
Personally, I also prefer Python over Ruby.

Thanks,
Chuan

________________________________________
From: mfoley-RYNwJFaOa9CEK/***@public.gmane.org on behalf of Matt Foley
Sent: Thursday, November 29, 2012 6:26 PM
To: common-dev-7ArZoLwFLBtd/SJB6HiN2Ni2O/***@public.gmane.org
Subject: Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Hello again. Crossed in the mail.

* What kind of tasks you envision Python scripts will enable that are
not possible today?
The point isn't to open brave new worlds. The point is to avoid the
nightmare of having to maintain multiple "parallel" scripts doing the SAME
THING in multiple scripting languages. I know from experience that they
never get maintained right. It's just a huge source of bugs, because when
they are in different languages, it can be quite difficult to determine
that they are *really* doing the same thing. And in a case like shell vs
powershell, it will be very common to have contributors who are not experts
in both.

I care deeply about having a high-quality release in both Linux and
Windows. And having a cross-platform scripting language will make it much
easier to maintain that quality over time, without "slip" between the two
platforms.

* Will the requirement of Python be pushed to clients using the
hadoop script? If so, this would affect all downstream projects that use
hadoop script in one why or the other, right?
If question #3 passes, then Python will become a run-time dependency for
Hadoop. That means it would need to be installed as part of the Hadoop
install preparation, just like all the other Hadoop run-time dependencies.

Is the main motivation of the proposal to make things easier for window,
so there is no need for cygwin? If that is the case, have you considered
doing directly BAT scripts? If you take Tomcat for example, they have BAT
scripts and SH scripts and things work quite nicely.
Of course it is sufficient, from the simple implementation perspective, to
translate all the shell scripts into bat or (better) powershell scripts.
That is, in fact, the most evident alternative to my proposals #1 and #3.

However, I ask -- beg! -- the community to consider it from the software
engineering perspective. We aren't here to just implement something once
and be done. It has to be maintained, as most of you on this list are well
aware, for years and years, across multiple generations. And trying to
maintain parallel scripts in multiple languages, when not necessitated by
genuine platform-specific requirements, is just creating bug generators in
the system.

Personally, I wouldn't be trilled to see the logic in the scripts to
get more complex, but on the opposite direction; IMO, scripts should be
trimmed to set env vars (with no voodoo logic), build the classpath (with
no voodoo logic, just from a set of dirs) and call Java.
See the first item above. The point is to enable cross-platform scripting
of the things we already have to script. IMO, scripts should get out of
the env var business entirely, but that's unrelated to this question :-)

Finally, this is code change, so I'm not sure why we are doing a vote.


I view this as a tools issue, that affects questions that go beyond the
one-time choice of how to write (or re-write) saveVersion.sh. Also Aaron
(atm) recommended that I bring it to the list. So here we are :-)

Cheers,
--Matt
Matt,
Let me repost my previous questions and a few more. I'd appreciate your
answers, as it will help me understand the full impact this would have in
Hadoop and related projects.
* Phyton as runtime requirement. Are you planing to migrate all BASH
scripts provided by Hadoop (or dynamically created -ie launcher scripts)
to Phyton?
* What else in the current build, besides saveVersion.sh, you see as
candidate to be migrated to Phyton?
* How are you planning to define what Phyton modules can be used? Will
developers have to install them manually?
* What kind of tasks you envision Python scripts will enable that are not
possible today?
* Will the requirement of Python be pushed to clients using the hadoop
script? If so, this would affect all downstream projects that use hadoop
script in one why or the other, right?
Is the main motivation of the proposal to make things easier for window, so
there is no need for cygwin? If that is the case, have you considered doing
directly BAT scripts? If you take Tomcat for example, they have BAT scripts
and SH scripts and things work quite nicely.
Personally, I wouldn't be trilled to see the logic in the scripts to get
more complex, but on the opposite direction; IMO, scripts should be trimmed
to set env vars (with no voodoo logic), build the classpath (with no voodoo
logic, just from a set of dirs) and call Java.
Finally, this is code change, so I'm not sure why we are doing a vote.
Thx.
Post by Alejandro Abdelnur
Matt, thanks for the clarification.
I may have missed the main point of the PROPOSAL thread then. I
personally
Post by Alejandro Abdelnur
want to continue the discussion before voting.
* Phyton as runtime requirement. Are you planing to migrate all BASH
scripts provided by Hadoop (or dynamically created -ie launcher scripts)
to Phyton?
* What else in the current build, besides saveVersion.sh, you see as
candidate to be migrated to Phyton?
* How are you planning to define what Phyton modules can be used? Will
developers have to install them manually?
Cheers
Post by Matt Foley
Hi Alejandro,
Please see in-line below.
Post by Alejandro Abdelnur
Matt,
The scope of this vote seems different from what was discussed in the
PROPOSAL thread.
In the PROPOSAL thread you indicated this was for Hadoop1 because it
is
Post by Alejandro Abdelnur
Post by Matt Foley
ANT
Post by Alejandro Abdelnur
based. And the main reason was to remove saveVersion.sh.
Your #3 was not discussed in the proposal, was it?
The item #3 was in my original statement of the problem, with which I
started the proposal thread. In fact, the thread title was "[PROPOSAL]
introduce Python as build-time and run-time dependency for Hadoop and
throughout Hadoop stack". It is true that only one or two people chose
to
Post by Alejandro Abdelnur
Post by Matt Foley
discuss #3 further in that thread.
The point is not just to replace a single script, but to provide a means to
do cross-platform scripts, which will over time replace many
non-platform-specific scripts written in platform-specific languages.
Post by Alejandro Abdelnur
It seems this vote is dragging much more stuff it was originally
discussed.
Post by Alejandro Abdelnur
I think you should suspend the vote, recap the motivation and then
restart
Post by Alejandro Abdelnur
the vote.
I respectfully disagree. I believe a careful reading of the cited
discussion thread, plus my own statement of the vote, provides
sufficient
Post by Alejandro Abdelnur
Post by Matt Foley
background for a thoughtful decision on the subject. Presumably so do
the
Post by Alejandro Abdelnur
Post by Matt Foley
ten other people who had already voted before you made that comment.
If several other people want more discussion first, please speak up.
Thanks,
--Matt
Post by Alejandro Abdelnur
-1 (It still seems an overkill to introduce a new runtime requirement
for
Post by Alejandro Abdelnur
building to replace a script.)
+1 (I think this is the right way to simplify the build)
-1 (AFAIK there is not such requirement at the moment, and if it comes
it
Post by Alejandro Abdelnur
would be in the form of an AM, which I'd argue it should leave outside
of
Post by Alejandro Abdelnur
Hadoop)
Thx
On Mon, Nov 26, 2012 at 1:16 PM, Giridharan Kesavan <
Post by Matt Foley
+1, +1, +1
-Giri
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce
Python
Post by Alejandro Abdelnur
Post by Matt Foley
as
Post by Matt Foley
build-time and run-time dependency for Hadoop and throughout
Hadoop
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Alejandro Abdelnur
Post by Matt Foley
stack".
Post by Matt Foley
1. Contributors shall be allowed to use Python as a
platform-independent
Post by Matt Foley
Post by Matt Foley
scripting language for build-time tasks, and add Python as a
build-time
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Matt Foley
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in
combination
Post by Alejandro Abdelnur
Post by Matt Foley
with
Post by Matt Foley
either plug-ins or Groovy scripts to do cross-platform build-time
tasks,
Post by Matt Foley
Post by Matt Foley
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a
platform-independent
Post by Matt Foley
Post by Matt Foley
scripting language for run-time tasks, and add Python as a
run-time
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Matt Foley
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES
contributors
Post by Matt Foley
to
Post by Matt Foley
use Maven plug-ins or Groovy as the only means of cross-platform
build-time
Post by Matt Foley
tasks, or to simply continue using platform-dependent scripts as
is
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Alejandro Abdelnur
being
Post by Matt Foley
Post by Matt Foley
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it,
and
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Matt Foley
until those are worked out I don't want to delay moving to
cross-platform
Post by Matt Foley
Post by Matt Foley
scripts for build-time tasks.
Best regards,
--Matt
--
Alejandro
--
Alejandro
--
Alejandro
Bikas Saha
2012-11-30 04:27:22 UTC
Permalink
+1, +1, +1 (non-binding)

We have had promising results for 1 and 2 when porting to Windows. 3 would
allow us to remove platform dependencies from test code. Agree that there
might be some nuanced operations that require OS specific environments but
this would lead to keeping them at a minimum.

Bikas
Post by Chuan Liu
+1 +1 +1
Agree with Matt on the code maintainability.
I think on one side we have Shell which is a script language and OS
dependent, e.g. as in bash vs powershell;
on the other side we have Java which is not a script language and OS independent.
I would accept any script language that can fix the gap as an OS
independent scripting language.
Personally, I also prefer Python over Ruby.
Thanks,
Chuan
________________________________________
Sent: Thursday, November 29, 2012 6:26 PM
Subject: Re: [VOTE] introduce Python as build-time and run-time
dependency for Hadoop and throughout Hadoop stack
Hello again. Crossed in the mail.
* What kind of tasks you envision Python scripts will enable that are
not possible today?
The point isn't to open brave new worlds. The point is to avoid the
nightmare of having to maintain multiple "parallel" scripts doing the SAME
THING in multiple scripting languages. I know from experience that they
never get maintained right. It's just a huge source of bugs, because when
they are in different languages, it can be quite difficult to determine
that they are *really* doing the same thing. And in a case like shell vs
powershell, it will be very common to have contributors who are not experts
in both.
I care deeply about having a high-quality release in both Linux and
Windows. And having a cross-platform scripting language will make it much
easier to maintain that quality over time, without "slip" between the two
platforms.
* Will the requirement of Python be pushed to clients using the
hadoop script? If so, this would affect all downstream projects that use
hadoop script in one why or the other, right?
If question #3 passes, then Python will become a run-time dependency for
Hadoop. That means it would need to be installed as part of the Hadoop
install preparation, just like all the other Hadoop run-time dependencies.
Is the main motivation of the proposal to make things easier for window,
so there is no need for cygwin? If that is the case, have you considered
doing directly BAT scripts? If you take Tomcat for example, they have BAT
scripts and SH scripts and things work quite nicely.
Of course it is sufficient, from the simple implementation perspective, to
translate all the shell scripts into bat or (better) powershell scripts.
That is, in fact, the most evident alternative to my proposals #1 and #3.
However, I ask -- beg! -- the community to consider it from the software
engineering perspective. We aren't here to just implement something once
and be done. It has to be maintained, as most of you on this list are well
aware, for years and years, across multiple generations. And trying to
maintain parallel scripts in multiple languages, when not necessitated by
genuine platform-specific requirements, is just creating bug generators in
the system.
Personally, I wouldn't be trilled to see the logic in the scripts to
get more complex, but on the opposite direction; IMO, scripts should be
trimmed to set env vars (with no voodoo logic), build the classpath (with
no voodoo logic, just from a set of dirs) and call Java.
See the first item above. The point is to enable cross-platform scripting
of the things we already have to script. IMO, scripts should get out of
the env var business entirely, but that's unrelated to this question :-)
Finally, this is code change, so I'm not sure why we are doing a vote.
I view this as a tools issue, that affects questions that go beyond the
one-time choice of how to write (or re-write) saveVersion.sh. Also Aaron
(atm) recommended that I bring it to the list. So here we are :-)
Cheers,
--Matt
On Thu, Nov 29, 2012 at 5:25 PM, Alejandro Abdelnur
Matt,
Let me repost my previous questions and a few more. I'd appreciate your
answers, as it will help me understand the full impact this would have in
Hadoop and related projects.
* Phyton as runtime requirement. Are you planing to migrate all BASH
scripts provided by Hadoop (or dynamically created -ie launcher scripts)
to Phyton?
* What else in the current build, besides saveVersion.sh, you see as
candidate to be migrated to Phyton?
* How are you planning to define what Phyton modules can be used? Will
developers have to install them manually?
* What kind of tasks you envision Python scripts will enable that are not
possible today?
* Will the requirement of Python be pushed to clients using the hadoop
script? If so, this would affect all downstream projects that use hadoop
script in one why or the other, right?
Is the main motivation of the proposal to make things easier for window, so
there is no need for cygwin? If that is the case, have you considered doing
directly BAT scripts? If you take Tomcat for example, they have BAT scripts
and SH scripts and things work quite nicely.
Personally, I wouldn't be trilled to see the logic in the scripts to get
more complex, but on the opposite direction; IMO, scripts should be trimmed
to set env vars (with no voodoo logic), build the classpath (with no voodoo
logic, just from a set of dirs) and call Java.
Finally, this is code change, so I'm not sure why we are doing a vote.
Thx.
Post by Alejandro Abdelnur
Matt, thanks for the clarification.
I may have missed the main point of the PROPOSAL thread then. I
personally
Post by Alejandro Abdelnur
want to continue the discussion before voting.
* Phyton as runtime requirement. Are you planing to migrate all BASH
scripts provided by Hadoop (or dynamically created -ie launcher
scripts)
Post by Alejandro Abdelnur
to Phyton?
* What else in the current build, besides saveVersion.sh, you see as
candidate to be migrated to Phyton?
* How are you planning to define what Phyton modules can be used? Will
developers have to install them manually?
Cheers
Post by Matt Foley
Hi Alejandro,
Please see in-line below.
On Mon, Nov 26, 2012 at 1:52 PM, Alejandro Abdelnur
Post by Alejandro Abdelnur
Matt,
The scope of this vote seems different from what was discussed in
the
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Alejandro Abdelnur
PROPOSAL thread.
In the PROPOSAL thread you indicated this was for Hadoop1 because
it
is
Post by Alejandro Abdelnur
Post by Matt Foley
ANT
Post by Alejandro Abdelnur
based. And the main reason was to remove saveVersion.sh.
Your #3 was not discussed in the proposal, was it?
The item #3 was in my original statement of the problem, with which I
started the proposal thread. In fact, the thread title was
"[PROPOSAL]
Post by Alejandro Abdelnur
Post by Matt Foley
introduce Python as build-time and run-time dependency for Hadoop and
throughout Hadoop stack". It is true that only one or two people
chose
to
Post by Alejandro Abdelnur
Post by Matt Foley
discuss #3 further in that thread.
The point is not just to replace a single script, but to provide a
means
Post by Alejandro Abdelnur
Post by Matt Foley
to
do cross-platform scripts, which will over time replace many
non-platform-specific scripts written in platform-specific languages.
Post by Alejandro Abdelnur
It seems this vote is dragging much more stuff it was originally
discussed.
Post by Alejandro Abdelnur
I think you should suspend the vote, recap the motivation and then
restart
Post by Alejandro Abdelnur
the vote.
I respectfully disagree. I believe a careful reading of the cited
discussion thread, plus my own statement of the vote, provides
sufficient
Post by Alejandro Abdelnur
Post by Matt Foley
background for a thoughtful decision on the subject. Presumably so
do
the
Post by Alejandro Abdelnur
Post by Matt Foley
ten other people who had already voted before you made that comment.
If several other people want more discussion first, please speak up.
Thanks,
--Matt
Post by Alejandro Abdelnur
-1 (It still seems an overkill to introduce a new runtime
requirement
Post by Alejandro Abdelnur
Post by Matt Foley
for
Post by Alejandro Abdelnur
building to replace a script.)
+1 (I think this is the right way to simplify the build)
-1 (AFAIK there is not such requirement at the moment, and if it
comes
Post by Alejandro Abdelnur
Post by Matt Foley
it
Post by Alejandro Abdelnur
would be in the form of an AM, which I'd argue it should leave
outside
Post by Alejandro Abdelnur
Post by Matt Foley
of
Post by Alejandro Abdelnur
Hadoop)
Thx
On Mon, Nov 26, 2012 at 1:16 PM, Giridharan Kesavan <
Post by Matt Foley
+1, +1, +1
-Giri
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL]
introduce
Post by Alejandro Abdelnur
Post by Matt Foley
Python
Post by Alejandro Abdelnur
Post by Matt Foley
as
Post by Matt Foley
build-time and run-time dependency for Hadoop and throughout
Hadoop
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Alejandro Abdelnur
Post by Matt Foley
stack".
Post by Matt Foley
1. Contributors shall be allowed to use Python as a
platform-independent
Post by Matt Foley
Post by Matt Foley
scripting language for build-time tasks, and add Python as a
build-time
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Matt Foley
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in
combination
Post by Alejandro Abdelnur
Post by Matt Foley
with
Post by Matt Foley
either plug-ins or Groovy scripts to do cross-platform
build-time
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Alejandro Abdelnur
tasks,
Post by Matt Foley
Post by Matt Foley
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a
platform-independent
Post by Matt Foley
Post by Matt Foley
scripting language for run-time tasks, and add Python as a
run-time
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Matt Foley
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES
contributors
Post by Matt Foley
to
Post by Matt Foley
use Maven plug-ins or Groovy as the only means of
cross-platform
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Alejandro Abdelnur
Post by Matt Foley
build-time
Post by Matt Foley
tasks, or to simply continue using platform-dependent scripts
as
is
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Alejandro Abdelnur
being
Post by Matt Foley
Post by Matt Foley
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in
it,
Post by Alejandro Abdelnur
Post by Matt Foley
and
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Matt Foley
until those are worked out I don't want to delay moving to
cross-platform
Post by Matt Foley
Post by Matt Foley
scripts for build-time tasks.
Best regards,
--Matt
--
Alejandro
--
Alejandro
--
Alejandro
Luke Lu
2012-11-30 11:21:45 UTC
Permalink
Thanks for the voting thread. Otherwise, many committers would have missed
it.

I agree that this is a superset of code change that has larger impact than
typical code change.
Post by Matt Foley
Post by Alejandro Abdelnur
Finally, this is code change, so I'm not sure why we are doing a vote.
I view this as a tools issue, that affects questions that go beyond the
one-time choice of how to write (or re-write) saveVersion.sh. Also Aaron
(atm) recommended that I bring it to the list. So here we are :-)
Luke Lu
2012-11-30 12:57:43 UTC
Permalink
I'd like to change my binding vote to -1, -0, -1.

Considering the hadoop stack/ecosystem as a whole, I think the best cross
platform scripting language to adopt is jruby for following reasons:

1. HBase already adopted jruby for HBase shell, which all current platform
vendors support.
2. We can control the version of language implementation at a per release
basis.
3. We don't have to introduce new dependencies in the de facto hadoop
stack. (see 1).

I'm all for improving multi-platform support. I think the best way to do
this is to have a thin native script wrappers (using env vars) to call the
cross-platform jruby scripts.

__Luke
Post by Luke Lu
Thanks for the voting thread. Otherwise, many committers would have missed
it.
I agree that this is a superset of code change that has larger impact than
typical code change.
Post by Matt Foley
Post by Alejandro Abdelnur
Finally, this is code change, so I'm not sure why we are doing a vote.
I view this as a tools issue, that affects questions that go beyond the
one-time choice of how to write (or re-write) saveVersion.sh. Also Aaron
(atm) recommended that I bring it to the list. So here we are :-)
Steve Loughran
2012-11-30 13:29:04 UTC
Permalink
Post by Luke Lu
I'd like to change my binding vote to -1, -0, -1.
Considering the hadoop stack/ecosystem as a whole, I think the best cross
1. HBase already adopted jruby for HBase shell, which all current platform
vendors support.
2. We can control the version of language implementation at a per release
basis.
3. We don't have to introduce new dependencies in the de facto hadoop
stack. (see 1).
I don't see why these arguments should have any impact on using python at
build time, as it doesn't introduce any dependencies downstream. Yes, you
need python at build time, but that's no worse than having a protoc
compiler, gcc and the automake toolchain.
Post by Luke Lu
I'm all for improving multi-platform support. I think the best way to do
this is to have a thin native script wrappers (using env vars) to call the
cross-platform jruby scripts.
Were it not for the env-var configuration hierarchy mess that things are in
today, I'd agree. where do you set your env vars? hadoop-env.sh? Where does
that come from? the hadoop conf dir? How do you find that? An env variable
or a ../../conf from bin/hadoop.sh which breaks once you start symlinking
to hadoop/bin; or do you assume a root installation in /etc/hadoop/conf,
which points to /etc/alternatives/hadoop-conf, which can then point back to
/etc/hadoop/conf.pseudo ? And what about JAVA_HOME?

Those env vars are something I'd like see the back of.
Luke Lu
2012-11-30 13:49:37 UTC
Permalink
where do you set your env vars... and what about JAVA_HOME
There should be only two env vars (JAVA_HOME and HADOOP_HOME) to deal with
in the native scripts (.bat on windows and .sh on unix platforms) to
boostrap jruby scripts, which deal with the rest of the envs.

__Luke
Luke Lu
2012-11-30 14:02:48 UTC
Permalink
Yes, you need python at build time, but that's no worse than having a
protoc
compiler, gcc and the automake toolchain.
The problem is that python is known to have _backward_ compatibility issues
on various platforms. It would be very annoying/time consuming to deal with
various support issues regarding python versions on various platforms.

I agree that autotools is a nightmare and should be converted (in branch-1
as well) to cmake (which has good versioning support :) The goal is to have
less external dependencies, not more, again mostly due to support issues.
If we want to introduce an external dependencies, we need to pick something
that are easy to support compatibility wise.

__Luke
Arun C Murthy
2012-12-02 18:20:46 UTC
Permalink
Post by Matt Foley
Hello again. Crossed in the mail.
* What kind of tasks you envision Python scripts will enable that are
not possible today?
The point isn't to open brave new worlds. The point is to avoid the
nightmare of having to maintain multiple "parallel" scripts doing the SAME
THING in multiple scripting languages.
+1, +1, +1

Couldn't agree more, I don't want to be in the business of having the same logic in multiple platform-specific scripts - doesn't make any sense.

Arun
Matt Foley
2012-11-30 01:51:36 UTC
Permalink
Post by Alejandro Abdelnur
Python as runtime requirement. Are you planing to migrate all
BASH scripts provided by Hadoop (or dynamically created -ie launcher
scripts) to Python?

I don't intend to mandate use of Python. Rather, I want there to be a
cross-platform option available. Things that are best done in
platform-specific manner, should be done in shell for linux, and powershell
for windows. But things that are best done in a platform-independent way,
can be, with a lower long-term maintenance cost than using different
scripts per platform.

This means that some, but not all, existing scripts may naturally migrate
to Python as the overall system is ported to Windows. Hopefully when
someone is porting a script that can be well done in a platform-independent
way, they will be able to choose Python and write a single script that can
replace the shell script and make it unnecessary to maintain two scripts
(doing the same job but in different languages!) going forward.
Post by Alejandro Abdelnur
What else in the current build, besides saveVersion.sh, you see
as candidate to be migrated to Python?

I have a greatly improved version of src/docs/relnotes.py that I would like
to submit, for auto-gen of release notes.
That's all that I have on my hotlist right now, although I anticipate that
some of the shell scripts invoked by ant may be natural candidates.
Post by Alejandro Abdelnur
How are you planning to define what Python modules can be used?
Will developers have to install them manually?

That's something the community will work out, the same way they decide what
library jars to include, and when to upgrade those versions. But first,
let's get an agreement in principle that this is the direction we want to
go.

Cheers,
--Matt
Post by Alejandro Abdelnur
Matt, thanks for the clarification.
I may have missed the main point of the PROPOSAL thread then. I personally
want to continue the discussion before voting.
* Phyton as runtime requirement. Are you planing to migrate all BASH
scripts provided by Hadoop (or dynamically created -ie launcher scripts)
to Phyton?
* What else in the current build, besides saveVersion.sh, you see as
candidate to be migrated to Phyton?
* How are you planning to define what Phyton modules can be used? Will
developers have to install them manually?
Cheers
Hi Alejandro,
Please see in-line below.
Post by Alejandro Abdelnur
Matt,
The scope of this vote seems different from what was discussed in the
PROPOSAL thread.
In the PROPOSAL thread you indicated this was for Hadoop1 because it is
ANT
Post by Alejandro Abdelnur
based. And the main reason was to remove saveVersion.sh.
Your #3 was not discussed in the proposal, was it?
The item #3 was in my original statement of the problem, with which I
started the proposal thread. In fact, the thread title was "[PROPOSAL]
introduce Python as build-time and run-time dependency for Hadoop and
throughout Hadoop stack". It is true that only one or two people chose
to
discuss #3 further in that thread.
The point is not just to replace a single script, but to provide a means
to
do cross-platform scripts, which will over time replace many
non-platform-specific scripts written in platform-specific languages.
Post by Alejandro Abdelnur
It seems this vote is dragging much more stuff it was originally
discussed.
Post by Alejandro Abdelnur
I think you should suspend the vote, recap the motivation and then
restart
Post by Alejandro Abdelnur
the vote.
I respectfully disagree. I believe a careful reading of the cited
discussion thread, plus my own statement of the vote, provides sufficient
background for a thoughtful decision on the subject. Presumably so do
the
ten other people who had already voted before you made that comment.
If several other people want more discussion first, please speak up.
Thanks,
--Matt
Post by Alejandro Abdelnur
-1 (It still seems an overkill to introduce a new runtime requirement
for
Post by Alejandro Abdelnur
building to replace a script.)
+1 (I think this is the right way to simplify the build)
-1 (AFAIK there is not such requirement at the moment, and if it comes
it
Post by Alejandro Abdelnur
would be in the form of an AM, which I'd argue it should leave outside
of
Post by Alejandro Abdelnur
Hadoop)
Thx
On Mon, Nov 26, 2012 at 1:16 PM, Giridharan Kesavan <
Post by Matt Foley
+1, +1, +1
-Giri
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce
Python
Post by Alejandro Abdelnur
Post by Matt Foley
as
Post by Matt Foley
build-time and run-time dependency for Hadoop and throughout Hadoop
stack".
Post by Matt Foley
1. Contributors shall be allowed to use Python as a
platform-independent
Post by Matt Foley
Post by Matt Foley
scripting language for build-time tasks, and add Python as a
build-time
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Matt Foley
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in
combination
Post by Alejandro Abdelnur
Post by Matt Foley
with
Post by Matt Foley
either plug-ins or Groovy scripts to do cross-platform build-time
tasks,
Post by Matt Foley
Post by Matt Foley
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a
platform-independent
Post by Matt Foley
Post by Matt Foley
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES
contributors
Post by Matt Foley
to
Post by Matt Foley
use Maven plug-ins or Groovy as the only means of cross-platform
build-time
Post by Matt Foley
tasks, or to simply continue using platform-dependent scripts as is
being
Post by Matt Foley
Post by Matt Foley
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it,
and
Post by Alejandro Abdelnur
Post by Matt Foley
Post by Matt Foley
until those are worked out I don't want to delay moving to
cross-platform
Post by Matt Foley
Post by Matt Foley
scripts for build-time tasks.
Best regards,
--Matt
--
Alejandro
--
Alejandro
Roman Shaposhnik
2012-11-27 17:16:13 UTC
Permalink
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce Python as
build-time and run-time dependency for Hadoop and throughout Hadoop stack".
Perhaps I'm missing something, but I can't possibly imagine how
a vote on a common-dev-7ArZoLwFLBtd/SJB6HiN2Ni2O/***@public.gmane.org could possibly
affect downstream projects. I honestly don't think we should be
in a business of telling Pig, Hive, Oozie, etc. what to use or
not to use.

With that in mind the following vote applies ONLY to Hadoop
project itself:
-1, +1, -1
Post by Matt Foley
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.
And yet #2 is, in my opinion, a much better investment of our collective
time. We already at the mercy of JDK, but at least it is a far superior
platform from a support and backward compatibility perspective. Anything
that we can offload to it -- is absolutely worth doing.

Thanks,
Roman.
Ivan Mitic
2012-11-29 23:41:22 UTC
Permalink
+1, +1, +1 (some comments inline)

-----Original Message-----
From: mfoley-RYNwJFaOa9CEK/***@public.gmane.org [mailto:mfoley-RYNwJFaOa9CEK/***@public.gmane.org] On Behalf Of Matt Foley
Sent: Saturday, November 24, 2012 12:13 PM
To: common-dev-7ArZoLwFLBtd/SJB6HiN2Ni2O/***@public.gmane.org
Subject: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

For discussion, please see previous thread "[PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack".

This vote consists of three separate items:

1. Contributors shall be allowed to use Python as a platform-independent scripting language for build-time tasks, and add Python as a build-time dependency.
Please vote +1, 0, -1.

2. Contributors shall be encouraged to use Maven tasks in combination with either plug-ins or Groovy scripts to do cross-platform build-time tasks, even under ant in Hadoop-1.
Please vote +1, 0, -1.
I believe 1&2 in combination make a total sense. I ported a few scripts to Python, and thus far, it showed to be up to the task and satisfy the cross-platform requirements. In my option, it is also important to agree on the version, as I've run into some breaking changes in version 3+.
3. Contributors shall be allowed to use Python as a platform-independent scripting language for run-time tasks, and add Python as a run-time dependency.
This is a great aspirational goal! Maintaining two sets of scripts would be a real challenge.
Please vote +1, 0, -1.

Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to use Maven plug-ins or Groovy as the only means of cross-platform build-time tasks, or to simply continue using platform-dependent scripts as is being done today.

Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and until those are worked out I don't want to delay moving to cross-platform scripts for build-time tasks.

Best regards,
--Matt
Mahadevan Venkatraman
2012-11-30 02:07:53 UTC
Permalink
+1, +1, +1 (non-binding)

Supporting Comments:

Build-time scripts: Using a platform independent language such as python (or maven in certain cases) will greatly help in reducing build breaks and improve on build script maintainability.

Run-time scripts: Most run-time scripts are end-user visible and are scripts that are needed to be run by admin such as starting/stop Hadoop cluster (hadoop-daemons) or by developers submitting a job (hadoop.cmd). There seem to be two types of script files:
- Scripts intended for a cluster admin or an IT admin:
- It is desirable to use a common set of python scripts that work across all platforms. However, in a Windows enterprise environment IT admins won't like it if they have to run python scripts to start/stop a cluster. So for these, there should be a PowerShell interface wrapper that can accept the right parameters and pass it down to the python script. Hopefully, the power-shell layer can be a simple pass-thru. This way the python scripts is like any other Java code hidden behind a well-known API surface. IT Admins can't debug it or modify it easily, but this is fine since for scripts like the aforementioned there isn't a requirement that IT Admins should be able to easily be able to view/modify the underlying code.
- For Windows specific things not supported by Python natively, such as setting ACLs, starting/stopping windows services it should be possible to re-factor the code appropriately. But a little bit of powershell/cmd for these call outs would be unavoidable.

- Scripts intended for developers/cluster users:
- Most of these scripts (e.g. hadoop.cmd) would be behind other API surface such as WebHDFS, ODBC, JDBC, Templeton etc. So the advantage of having a common script across platforms outweighs the use of cmd/powershell as a native windows feature. Again, it should also be possible to provide simple powershell wrappers for a windows environment.

Thanks, Mahadevan.

-----Original Message-----
From: Ivan Mitic [mailto:ivanmi-***@public.gmane.org]
Sent: Thursday, November 29, 2012 3:41 PM
To: common-dev-7ArZoLwFLBtd/SJB6HiN2Ni2O/***@public.gmane.org; mattf-1oDqGaOF3Lkdnm+***@public.gmane.org
Subject: RE: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

+1, +1, +1 (some comments inline)

-----Original Message-----
From: mfoley-RYNwJFaOa9CEK/***@public.gmane.org [mailto:mfoley-RYNwJFaOa9CEK/***@public.gmane.org] On Behalf Of Matt Foley
Sent: Saturday, November 24, 2012 12:13 PM
To: common-dev-7ArZoLwFLBtd/SJB6HiN2Ni2O/***@public.gmane.org
Subject: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

For discussion, please see previous thread "[PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack".

This vote consists of three separate items:

1. Contributors shall be allowed to use Python as a platform-independent scripting language for build-time tasks, and add Python as a build-time dependency.
Please vote +1, 0, -1.

2. Contributors shall be encouraged to use Maven tasks in combination with either plug-ins or Groovy scripts to do cross-platform build-time tasks, even under ant in Hadoop-1.
Please vote +1, 0, -1.
I believe 1&2 in combination make a total sense. I ported a few scripts to Python, and thus far, it showed to be up to the task and satisfy the cross-platform requirements. In my option, it is also important to agree on the version, as I've run into some breaking changes in version 3+.
3. Contributors shall be allowed to use Python as a platform-independent scripting language for run-time tasks, and add Python as a run-time dependency.
This is a great aspirational goal! Maintaining two sets of scripts would be a real challenge.
Please vote +1, 0, -1.

Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to use Maven plug-ins or Groovy as the only means of cross-platform build-time tasks, or to simply continue using platform-dependent scripts as is being done today.

Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and until those are worked out I don't want to delay moving to cross-platform scripts for build-time tasks.

Best regards,
--Matt
Doug Cutting
2012-11-30 16:55:15 UTC
Permalink
-1, +1, -1

Run- & build-time scripting should be limited to operations that are
impossible in Java. These should not be complex nor should we
encourage more complexity in them. A parallel set of simple .bat
files for such operations seems preferable to adding a Python
dependency.

Doug
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce Python as
build-time and run-time dependency for Hadoop and throughout Hadoop stack".
1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in combination with
either plug-ins or Groovy scripts to do cross-platform build-time tasks,
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
use Maven plug-ins or Groovy as the only means of cross-platform build-time
tasks, or to simply continue using platform-dependent scripts as is being
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.
Best regards,
--Matt
Raja Aluri
2012-12-01 00:57:32 UTC
Permalink
+1, +1, +1 (non binding)

It makes it a lot easier to make build tools (that cannot be developed
easily using maven) work across non-unix like platforms (especially
windows).

Raja
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce Python as
build-time and run-time dependency for Hadoop and throughout Hadoop stack".
1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in combination with
either plug-ins or Groovy scripts to do cross-platform build-time tasks,
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
use Maven plug-ins or Groovy as the only means of cross-platform build-time
tasks, or to simply continue using platform-dependent scripts as is being
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.
Best regards,
--Matt
Eli Collins
2012-12-01 01:08:28 UTC
Permalink
-1, 0, -1

IIUC the only platform we plan to add support for that we can't easily
support today (w/o an emulation layer like cygwin) is Windows, and it
seems like making the bash scripts simpler and having parallel bat
files is IMO a better approach.
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce Python as
build-time and run-time dependency for Hadoop and throughout Hadoop stack".
1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in combination with
either plug-ins or Groovy scripts to do cross-platform build-time tasks,
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
use Maven plug-ins or Groovy as the only means of cross-platform build-time
tasks, or to simply continue using platform-dependent scripts as is being
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.
Best regards,
--Matt
Steve Loughran
2012-12-01 10:44:31 UTC
Permalink
Post by Eli Collins
-1, 0, -1
IIUC the only platform we plan to add support for that we can't easily
support today (w/o an emulation layer like cygwin) is Windows, and it
seems like making the bash scripts simpler and having parallel bat
files is IMO a better approach.
WinNT Bat/CMD files are the worst possible scripting language invented. At
the very least, .py should be the language of choice there
Doug Cutting
2012-12-01 18:23:14 UTC
Permalink
Post by Steve Loughran
WinNT Bat/CMD files are the worst possible scripting language invented. At
the very least, .py should be the language of choice there
The scripts should not have so much logic that .bat files are a problem.

Doug
Konstantin Boudnik
2012-12-13 00:53:34 UTC
Permalink
Post by Steve Loughran
Post by Eli Collins
-1, 0, -1
IIUC the only platform we plan to add support for that we can't easily
support today (w/o an emulation layer like cygwin) is Windows, and it
seems like making the bash scripts simpler and having parallel bat
files is IMO a better approach.
WinNT Bat/CMD files are the worst possible scripting language invented. At
the very least, .py should be the language of choice there
Compare to the OS in question - it isn't _that_ bad ;)
Joep Rottinghuis
2012-12-01 20:28:31 UTC
Permalink
0, 0, -1 (non-binding)

Joep
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce Python as
build-time and run-time dependency for Hadoop and throughout Hadoop stack".
1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in combination with
either plug-ins or Groovy scripts to do cross-platform build-time tasks,
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
use Maven plug-ins or Groovy as the only means of cross-platform build-time
tasks, or to simply continue using platform-dependent scripts as is being
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.
Best regards,
--Matt
Eric Yang
2012-12-02 06:07:36 UTC
Permalink
-1, +1, -1

Python has fairly inconsistent support across all major OS vendors. It is
hard to get it right unless the scripts are all designed to make use of
Python 2.4. However, Python 2.4 doesn't have necessary OS features to make
Python useful in runtime or build environment unless you write a lot of
custom modules. Which defeats the purpose to use python as intermediate
layer to do OS dependent work. Jruby may be a better choice.

regards,
Eric
Post by Joep Rottinghuis
0, 0, -1 (non-binding)
Joep
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce Python
as
Post by Matt Foley
build-time and run-time dependency for Hadoop and throughout Hadoop
stack".
Post by Matt Foley
1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in combination
with
Post by Matt Foley
either plug-ins or Groovy scripts to do cross-platform build-time tasks,
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors
to
Post by Matt Foley
use Maven plug-ins or Groovy as the only means of cross-platform
build-time
Post by Matt Foley
tasks, or to simply continue using platform-dependent scripts as is being
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.
Best regards,
--Matt
Konstantin Boudnik
2012-12-13 00:55:13 UTC
Permalink
Post by Luke Lu
-1, +1, -1
Python has fairly inconsistent support across all major OS vendors. It is
hard to get it right unless the scripts are all designed to make use of
Python 2.4. However, Python 2.4 doesn't have necessary OS features to make
Python useful in runtime or build environment unless you write a lot of
custom modules. Which defeats the purpose to use python as intermediate
layer to do OS dependent work. Jruby may be a better choice.
JRuby? Really? Groovy is already there and it is really a Java dialect unlike
JRuby. And yes - it is quite suitable for build things, considering the use of
it in BigTop

Cos
Post by Luke Lu
Post by Joep Rottinghuis
0, 0, -1 (non-binding)
Joep
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce Python
as
Post by Matt Foley
build-time and run-time dependency for Hadoop and throughout Hadoop
stack".
Post by Matt Foley
1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in combination
with
Post by Matt Foley
either plug-ins or Groovy scripts to do cross-platform build-time tasks,
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors
to
Post by Matt Foley
use Maven plug-ins or Groovy as the only means of cross-platform
build-time
Post by Matt Foley
tasks, or to simply continue using platform-dependent scripts as is being
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.
Best regards,
--Matt
Tom White
2012-12-03 14:23:50 UTC
Permalink
+1, +1, -1

Tom
Post by Matt Foley
For discussion, please see previous thread "[PROPOSAL] introduce Python as
build-time and run-time dependency for Hadoop and throughout Hadoop stack".
1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.
2. Contributors shall be encouraged to use Maven tasks in combination with
either plug-ins or Groovy scripts to do cross-platform build-time tasks,
even under ant in Hadoop-1.
Please vote +1, 0, -1.
3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.
Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
use Maven plug-ins or Groovy as the only means of cross-platform build-time
tasks, or to simply continue using platform-dependent scripts as is being
done today.
Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.
Best regards,
--Matt
Doug Cutting
2012-12-03 18:37:36 UTC
Permalink
Post by Matt Foley
Vote closes at 12:30pm PST on Saturday 1 December.
It's not clear to me what kind of a vote this is. It seems closest to
a code change vote, since it implies code changes, although without a
specific patch yet proposed. As such it would follow lazy consensus
rules. Or is it merely intended as a straw poll, to gauge community
opinion?

Doug
Matt Foley
2012-12-03 19:21:43 UTC
Permalink
It is intended to be a "technical discussion", in the sense of the bylaws
statement (in section "Roles and Responsibilities: Committers"), "Committers
may cast binding votes on any technical discussion regarding any
subproject." I therefore intended it to be a majority vote of Committers.

Interestingly, this need to discuss tooling and other issues that go beyond
a simple "code change" is not addressed in the "Decision Making: Actions"
section of the bylaws. That need seems to have been overlooked in the
current rev of that section. But I do not agree that such issues are "code
changes"; it relates to the tools we depend on to make code changes, which
is clearly qualitatively different.

--Matt
Post by Doug Cutting
Post by Matt Foley
Vote closes at 12:30pm PST on Saturday 1 December.
It's not clear to me what kind of a vote this is. It seems closest to
a code change vote, since it implies code changes, although without a
specific patch yet proposed. As such it would follow lazy consensus
rules. Or is it merely intended as a straw poll, to gauge community
opinion?
Doug
Doug Cutting
2012-12-03 19:37:00 UTC
Permalink
Post by Matt Foley
It is intended to be a "technical discussion", in the sense of the bylaws
statement (in section "Roles and Responsibilities: Committers"), "Committers
may cast binding votes on any technical discussion regarding any
subproject." I therefore intended it to be a majority vote of Committers.
I'm not sure how you conclude that technical discussions are resolved
with majority votes.

http://www.apache.org/foundation/voting.html
Post by Matt Foley
Interestingly, this need to discuss tooling and other issues that go beyond
a simple "code change" is not addressed in the "Decision Making: Actions"
section of the bylaws. That need seems to have been overlooked in the
current rev of that section. But I do not agree that such issues are "code
changes"; it relates to the tools we depend on to make code changes, which
is clearly qualitatively different.
I don't see a striking difference between this and a proposed code
change. How is a -1 here fundamentally different than a veto on a
patch submitted to HADOOP-9082?

Doug
Matt Foley
2012-12-03 22:08:41 UTC
Permalink
Hi Doug,
The apache voting process contradicts the Hadoop bylaws:
http://www.apache.org/foundation/voting.html says that only PMC members can
make binding votes on code modification issues, but
http://hadoop.apache.org/bylaws.html says that Committers can make binding
votes on them. Does that mean the Hadoop bylaws have to change?

Thanks,
--Matt
Post by Matt Foley
Post by Matt Foley
It is intended to be a "technical discussion", in the sense of the bylaws
statement (in section "Roles and Responsibilities: Committers"),
"Committers
Post by Matt Foley
may cast binding votes on any technical discussion regarding any
subproject." I therefore intended it to be a majority vote of
Committers.
I'm not sure how you conclude that technical discussions are resolved
with majority votes.
http://www.apache.org/foundation/voting.html
Post by Matt Foley
Interestingly, this need to discuss tooling and other issues that go
beyond
Post by Matt Foley
a simple "code change" is not addressed in the "Decision Making: Actions"
section of the bylaws. That need seems to have been overlooked in the
current rev of that section. But I do not agree that such issues are
"code
Post by Matt Foley
changes"; it relates to the tools we depend on to make code changes,
which
Post by Matt Foley
is clearly qualitatively different.
I don't see a striking difference between this and a proposed code
change. How is a -1 here fundamentally different than a veto on a
patch submitted to HADOOP-9082?
Doug
Doug Cutting
2012-12-03 23:57:06 UTC
Permalink
Post by Matt Foley
http://www.apache.org/foundation/voting.html says that only PMC members can
make binding votes on code modification issues, but
http://hadoop.apache.org/bylaws.html says that Committers can make binding
votes on them. Does that mean the Hadoop bylaws have to change?
This may be a little atypical but I don't see any harm. The Hadoop
PMC is willing to respect the veto of any committer as binding. I'd
worry more if we tried to reduce vetoes to a subset of the PMC than
extend it to a superset.

Do you think this is problematic?

Doug
Matt Foley
2012-12-04 01:22:32 UTC
Permalink
No, but it speaks to whether the Hadoop bylaws can extend the Apache voting
procedures and draw finer distinctions. For example, the Apache voting
procedures only identify 3 types of votable issue, while the Hadoop bylaws
identify 9 types of votable issues.

If we were forced to fit "development tools" into one of the three
categories cited by the Apache voting procedures, it would be fitting a
square peg in a round hole. Since we can instead look at the 9 categories
provided by the Hadoop bylaws, we can acknowledge that "development tools"
was an overlooked category. But in my opinion it certainly doesn't fit
into the "code change" category. Tooling is a meta-issue regarding HOW we
do what needs to be done. In this case, whether we allow a
platform-independent solution, or force contributors to maintain parallel
scripts in multiple platform-specific languages for no reason.

--Matt
Post by Chris Nauroth
Post by Matt Foley
http://www.apache.org/foundation/voting.html says that only PMC members
can
Post by Matt Foley
make binding votes on code modification issues, but
http://hadoop.apache.org/bylaws.html says that Committers can make
binding
Post by Matt Foley
votes on them. Does that mean the Hadoop bylaws have to change?
This may be a little atypical but I don't see any harm. The Hadoop
PMC is willing to respect the veto of any committer as binding. I'd
worry more if we tried to reduce vetoes to a subset of the PMC than
extend it to a superset.
Do you think this is problematic?
Doug
Doug Cutting
2012-12-04 04:50:50 UTC
Permalink
Hadoop's bylaws do draw finer distinctions than the Apache voting
guidelines document, but we follow the same general principles that
are described there.

As I understand it, the rationale for using consensus for code is that
everyone needs to agree on everything in the codebase or we've
disenfranchised some. We share a single code repository and we need
to all agree on what goes into it. A release does not require
majority since if someone doesn't agree on the timing of a release
they can choose to make another at a different time, but every change
that goes into each release requires consensus. We also require
consensus for committers and PMC member votes so that we have a group
that's coherent and is able to reach consensus on code changes.

Re-writing bash scripts in Python is neither a release nor other
procedural issue. It involves changes to the software we maintain and
seems to fall clearly into the "code change" category.

If you disagree then perhaps you'd like to propose a change to the
bylaws so that scripts have different rules than other kinds of
software, but I don't yet see the rationale for such a change.

Doug
Post by Matt Foley
No, but it speaks to whether the Hadoop bylaws can extend the Apache voting
procedures and draw finer distinctions. For example, the Apache voting
procedures only identify 3 types of votable issue, while the Hadoop bylaws
identify 9 types of votable issues.
If we were forced to fit "development tools" into one of the three
categories cited by the Apache voting procedures, it would be fitting a
square peg in a round hole. Since we can instead look at the 9 categories
provided by the Hadoop bylaws, we can acknowledge that "development tools"
was an overlooked category. But in my opinion it certainly doesn't fit
into the "code change" category. Tooling is a meta-issue regarding HOW we
do what needs to be done. In this case, whether we allow a
platform-independent solution, or force contributors to maintain parallel
scripts in multiple platform-specific languages for no reason.
--Matt
Post by Chris Nauroth
Post by Matt Foley
http://www.apache.org/foundation/voting.html says that only PMC members
can
Post by Matt Foley
make binding votes on code modification issues, but
http://hadoop.apache.org/bylaws.html says that Committers can make
binding
Post by Matt Foley
votes on them. Does that mean the Hadoop bylaws have to change?
This may be a little atypical but I don't see any harm. The Hadoop
PMC is willing to respect the veto of any committer as binding. I'd
worry more if we tried to reduce vetoes to a subset of the PMC than
extend it to a superset.
Do you think this is problematic?
Doug
Matt Foley
2012-12-04 17:58:35 UTC
Permalink
Hi Doug,
I didn't read your email until this morning, but I spent time overnight
thinking about the Apache Way and reached similar conclusions. While
tooling is broader in scope than a single code change, it is a technical
choice that we all have to live with.

More importantly, "Community over Code" would suggest that if only slightly
less than 50% of the community is uncomfortable with adding Python to the
mix which is the Hadoop stack, then we probably shouldn't do it, regardless
of the technical merits.

Therefore, I withdraw the question.

We will search for other means of cleaning up the shellscript problem and
making all functionality work with parity in the Windows world. I am quite
partial to Allen Wittenauer's suggestion in
HADOOP-9082<https://issues.apache.org/jira/browse/HADOOP-9082?focusedCommentId=13507163&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13507163>
that
the scripts should be greatly simplified before dealing with the
cross-platform question. It is in many respects silly to have so much
functionality "on the side" instead of dealing with it forthrightly in core
code. In that spirit, I am also -1 on burying the same complexity in maven
plug-ins, which after all just adds another couple layers of complexity,
and limits the number of people who understand it, as well.

Thanks to all who voted and contributed to the discussion.
Best regards,
--Matt
Post by Doug Cutting
Hadoop's bylaws do draw finer distinctions than the Apache voting
guidelines document, but we follow the same general principles that
are described there.
As I understand it, the rationale for using consensus for code is that
everyone needs to agree on everything in the codebase or we've
disenfranchised some. We share a single code repository and we need
to all agree on what goes into it. A release does not require
majority since if someone doesn't agree on the timing of a release
they can choose to make another at a different time, but every change
that goes into each release requires consensus. We also require
consensus for committers and PMC member votes so that we have a group
that's coherent and is able to reach consensus on code changes.
Re-writing bash scripts in Python is neither a release nor other
procedural issue. It involves changes to the software we maintain and
seems to fall clearly into the "code change" category.
If you disagree then perhaps you'd like to propose a change to the
bylaws so that scripts have different rules than other kinds of
software, but I don't yet see the rationale for such a change.
Doug
Post by Matt Foley
No, but it speaks to whether the Hadoop bylaws can extend the Apache
voting
Post by Matt Foley
procedures and draw finer distinctions. For example, the Apache voting
procedures only identify 3 types of votable issue, while the Hadoop
bylaws
Post by Matt Foley
identify 9 types of votable issues.
If we were forced to fit "development tools" into one of the three
categories cited by the Apache voting procedures, it would be fitting a
square peg in a round hole. Since we can instead look at the 9
categories
Post by Matt Foley
provided by the Hadoop bylaws, we can acknowledge that "development
tools"
Post by Matt Foley
was an overlooked category. But in my opinion it certainly doesn't fit
into the "code change" category. Tooling is a meta-issue regarding HOW
we
Post by Matt Foley
do what needs to be done. In this case, whether we allow a
platform-independent solution, or force contributors to maintain parallel
scripts in multiple platform-specific languages for no reason.
--Matt
Post by Chris Nauroth
Post by Matt Foley
http://www.apache.org/foundation/voting.html says that only PMC
members
Post by Matt Foley
Post by Chris Nauroth
can
Post by Matt Foley
make binding votes on code modification issues, but
http://hadoop.apache.org/bylaws.html says that Committers can make
binding
Post by Matt Foley
votes on them. Does that mean the Hadoop bylaws have to change?
This may be a little atypical but I don't see any harm. The Hadoop
PMC is willing to respect the veto of any committer as binding. I'd
worry more if we tried to reduce vetoes to a subset of the PMC than
extend it to a superset.
Do you think this is problematic?
Doug
Radim Kolar
2012-12-04 19:41:34 UTC
Permalink
result of vote is to close
https://issues.apache.org/jira/browse/HADOOP-9073 and write groovy in
pom.xml variant (option number 2)?
Matt Foley
2012-12-04 20:28:27 UTC
Permalink
Please close HADOOP-9073 as "will not fix", citing this discussion.

I'm -1 on groovy in maven. That's worse, not better. Let it sit for a
while and let people propose simplifications of the script situation.

Thanks,
--Matt
result of vote is to close https://issues.apache.org/**
jira/browse/HADOOP-9073<https://issues.apache.org/jira/browse/HADOOP-9073>and write groovy in pom.xml variant (option number 2)?
Alejandro Abdelnur
2012-12-04 21:00:58 UTC
Permalink
i've been playing around writing a couple of maven plugins, one to replace saveversion.sh and the other to invoke protoc. they both work in windows standard cmd (no cygwin required). together with hadoop-8887 they would remove most of the scripting done the poms.

(they also work in linux and osx)

they are java based, only require having SVN GIT & PROTOC avail in the PATH.

if cmake works in windows, i assume hadoop-8887 would be almost there.

this would leave the tar stitching, which is done as script to handle SO symlinks. though i have and idea on how we could take care of it.

i'll be creating a jira momentarily.

thx

Alejandro
Post by Matt Foley
Please close HADOOP-9073 as "will not fix", citing this discussion.
I'm -1 on groovy in maven. That's worse, not better. Let it sit for a
while and let people propose simplifications of the script situation.
Thanks,
--Matt
result of vote is to close https://issues.apache.org/**
jira/browse/HADOOP-9073<https://issues.apache.org/jira/browse/HADOOP-9073>and write groovy in pom.xml variant (option number 2)?
Matt Foley
2012-12-04 22:35:24 UTC
Permalink
There's already a jira:
HADOOP-8924<https://issues.apache.org/jira/browse/HADOOP-8924>
Post by Alejandro Abdelnur
i've been playing around writing a couple of maven plugins, one to replace
saveversion.sh and the other to invoke protoc. they both work in windows
standard cmd (no cygwin required). together with hadoop-8887 they would
remove most of the scripting done the poms.
(they also work in linux and osx)
they are java based, only require having SVN GIT & PROTOC avail in the PATH.
if cmake works in windows, i assume hadoop-8887 would be almost there.
this would leave the tar stitching, which is done as script to handle SO
symlinks. though i have and idea on how we could take care of it.
i'll be creating a jira momentarily.
thx
Alejandro
Post by Matt Foley
Please close HADOOP-9073 as "will not fix", citing this discussion.
I'm -1 on groovy in maven. That's worse, not better. Let it sit for a
while and let people propose simplifications of the script situation.
Thanks,
--Matt
result of vote is to close https://issues.apache.org/**
jira/browse/HADOOP-9073<
https://issues.apache.org/jira/browse/HADOOP-9073>and write groovy in
pom.xml variant (option number 2)?
Continue reading on narkive:
Loading...